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In the past thirty years of frontier energy collider 
physics, the only objects discovered were those for which 
the predictions were definite. The W and Z bosons (dis- 
covered at CERN in 1983 0, 0, llfll and the top quark 
(discovered at Fermilab in 1995 5j, [6|) were well known 
objects long before their discoveries, with all quantum 
numbers but mass uniquely specified. The present situa- 
tion is qualitatively different, with plausible predictions 
for physics lying beyond the standard model running the 
gamut of possible experimental signatures. 

Searches for physics beyond the standard model typi- 
cally begin with a particular model. A region is selected 
in the data where the model's expected contribution is 
enhanced, and the extent to which the data (dis)favor 
the model is determined by comparing the prediction to 
data. 

The state of the theoretical landscape and the vastness 
of most model spaces suggests the utility of searching in 
a different space altogether. The experimental space, de- 
fined by the isolated and energetic objects observed in 
frontier energy collisions, forms a natural space to con- 
sider. 

This article describes a systematic and model- 
independent look (Vista) at gross features of the data, 
and a quasi-model-independent search (Sleuth) for new 
physics at high transverse momentum. These global al- 
gorithms provide a complementary approach to searches 
optimized for more specific new physics scenarios. 



II. STRATEGY 

The search for new physics described in this article is 
designed with the intention of maximizing the chance for 
discovery, and not excluding model parameter space if no 
discrepancy is found. Discrepancies between data and a 
complete standard model background estimate are iden- 
tified in a global sample of high transverse momentum 
(high-pr) collision events. Three statistics are employed 
to identify and quantify disagreement: populations of ex- 
clusive final states defined by the objects the events con- 
tain, shapes of kinematic distributions, and excesses on 
the tail of summed scalar transverse momentum distri- 
butions. 

The Vista [5l[ algorithm provides a global study of the 
standard model prediction and CDF detector response in 
the bulk of the high-p^ data; an algorithm called Sleuth 
complements this with a search for possibly small cross 
section physics in the high-pr tails. The purpose of these 
algorithms is to identify discrepancies worthy of further 
consideration. 

A claim of discovery requires convincing arguments 
that the observed discrepancy between data and stan- 
dard model prediction 



1. is not a statistical fluctuation, 
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2. is not due to a mismodeling of the detector re- 
sponse, and 

3. is not due to an inadequate implementation of the 
standard model prediction, 

and therefore must be due to new underlying physics. 
Any observed discrepancy is subject to scrutiny, and ex- 
planations are sought in terms of the above points. 

The Vista and Sleuth algorithms provide a means for 
making the above three arguments, with a high threshold 
placed on the statistical significance of a discrepancy in 
order to minimize the chance of a false discovery claim. 
As described later, this threshold is the requirement that 
the false discovery rate is less than 0.001, after taking into 
account the total number of final states, distributions, or 
regions being examined. 

This analysis employs a correction model implement- 
ing specific hypotheses to account for mismodeling of de- 
tector response and imperfect implementation of stan- 
dard model prediction. Achieving this on the entire high- 
er dataset requires a framework for quickly implement- 
ing and testing modifications to the correction model. 
The specific details of the correction model are intention- 
ally kept as simple as possible in the interest of trans- 
parency in the event of a possible new physics claim. 
Vista's toolkit includes a global comparison of data to 
the standard model prediction, with a check of thousands 
of kinematic distributions and an easily adjusted correc- 
tion model allowing a quick fit for values of associated 
correction factors. 

The traditional notions of signal and control regions 
are modified. Without prejudice as to where new physics 
may appear, all regions of the data are treated as both 
signal and control. This analysis is not blind, but rather 
seeks to identify and understand discrepancies between 
data and the standard model prediction. With the goal 
of discovery, emphasis is placed on examining discrep- 
ancies, focusing on outliers rather than global goodness 
of fit. Individual discrepancies that are not statistically 
significant are generally not pursued. 

Vista and Sleuth are employed simultaneously, 
rather than sequentially. An effect highlighted by 
Sleuth prompts additional investigation of the discrep- 
ancy, usually resulting in a specific hypothesis explaining 
the discrepancy in terms of a detector effect or adjust- 
ment to the standard model prediction that is then fed 
back and tested for global consistency using Vista. 

Forming hypotheses for the cause of specific discrepan- 
cies, implementing those hypotheses to assess their wider 
consequences, and testing global agreement after the im- 
plementation are emphasized as the crucial activities for 
the investigator throughout the process of data analysis. 
This process is constrained by the requirement that all 
adjustments be physically motivated. The investigation 
and resolution of discrepancies highlighted by the algo- 
rithms is the defining characteristic of this global analy- 
sis 1521. 



This search for new physics terminates when one of 
two conditions are satisfied: either a compelling case for 
new physics is made, or there remain no statistically sig- 
nificant discrepancies on which a new physics case can 
be made. In the former case, to quantitatively assess 
the significance of the potential discovery, a full treat- 
ment of systematic uncertainties must be implemented. 
In the latter case, it is sufficient to demonstrate that all 
observed effects are not in significant disagreement with 
an appropriate global Standard Model description. 



III. VISTA 

This section describes ViSTA: object identification, 
event selection, estimation of standard model back- 
grounds, simulation of the CDF detector response, de- 
velopment of a correction model, and results. 



A. CDF II detector 

CDF II is a general purpose detector @, [1] designed 
to detect particles produced in pp collisions. The detec- 
tor has a cylindrical layout centered on the accelerator 
beamline. 

CDF uses a cylindrical coordinate system with the z- 
axis along the axis of the colliding beams. The variable 9 
is the polar angle relative to the incoming proton beam, 
and the variable <fi is the azimuthal angle about the beam 
axis. The pseudorapidity of a particle trajectory is de- 
fined as r\ = — ln(tan(0/2)). It is also useful to define 
detector pseudorapidity 7/det, denoting a particle's pseu- 
dorapidity in a coordinate system in which the origin lies 
at the center of the CDF detector rather than at the event 
vertex. The transverse momentum px is the component 
of the momentum projected on a plane perpendicular to 
the beam axis. 

Charged particle tracks are reconstructed in a 3.1 m 
long open cell drift chamber that performs up to 96 mea- 
surements of the track position in the radial region from 
0.4 m to 1.4 m. Between the beam pipe and this tracking 
chamber are multiple layers of silicon microstrip detec- 
tors, enabling high precision determination of the impact 
parameter of a track relative to the primary event vertex. 
The tracking detectors are immersed in a uniform 1.4 T 
solenoidal magnetic field. 

Outside the solenoid, calorimeter modules are arranged 
in a projective tower geometry to provide energy mea- 
surements for both charged and neutral particles. Pro- 
portional chambers are embedded in the electromagnetic 
calorimeters to measure the transverse profile of electro- 
magnetic showers at a depth corresponding to the shower 
maximum for electrons. The outermost part of the de- 
tector consists of a series of drift chambers used to detect 
and identify muons, minimum ionizing particles that typ- 
ically pass through the calorimeter. 



6 



A set of forward gas Cerenkov detectors is used to 
measure the average number of inelastic pp collisions per 
Tevatron bunch crossing, and hence determine the lumi- 
nosity acquired. A three level trigger and data acqui- 
sition system selects the most interesting collisions for 
offline analysis. 

Here and below the word "central" is used to describe 
objects with |rydet| < 1-0; "plug" is used to describe ob- 
jects with 1.0 < |?7dct| < 2.5. 



B. Object identification 

Energetic and isolated electrons, muons, taus, photons, 
jets, and 6-tagged jets with |r/d e t| < 2.5 audpr > 17 GeV 
are identified according to standard criteria. The same 
criteria are used for all events. The isolation criteria 
employed vary according to object, but roughly require 
less than 2 GeV of extra energy flow within a cone of 
AR, = \J Arj 2 + A(j) 2 = 0.4 in rj-(j) space around each 
object. 

Standard CDF criteria @ are used to identify electrons 
(e ± ) in the central and plug regions of the CDF detector. 
Electrons are characterized by a narrow shower in the 
central or plug electromagnetic calorimeter and a match- 
ing isolated track in the central gas tracking chamber or 
a matching plug track in the silicon detector. 

Standard CDF muons (fi^) are identified using three 
separate subdetectors in the regions |?7det| < 0.6, 0.6 < 
|?7dct| < 1-0, and 1.0 < |?7dot| < 1-5 Q. Muons are 
characterized by a track in the central tracking cham- 
ber matched to a track segment in the central muon de- 
tectors, with energy consistent with minimum ionizing 
deposition in the electromagnetic and hadronic calorime- 
ters along the muon trajectory. 

Narrow central jets with a single charged track are 
identified as tau leptons (t^) that have decayed hadron- 
ically [To| . Taus are distinguished from electrons by re- 
quiring a substantial fraction of their energy to be de- 
posited in the hadron calorimeter; taus are distinguished 
from muons by requiring no track segment in the muon 
detector coinciding with the extrapolated track of the 
tau. Track and calorimeter isolation requirements are 
imposed. 

Standard CDF criteria requiring the presence of a nar- 
row electromagnetic cluster with no associated tracks are 
used to identify photons (7) in the central and plug re- 
gions of the CDF detector [ll|. 

Jets (J) are reconstructed using the JetClu [l2j clus- 
tering algorithm with a cone of size AR = 0.4, unless the 
event contains one or more jets with pt > 200 GeV and 
no leptons or photons, in which case cones of AR = 0.7 
are used. Jet energies are appropriately corrected to the 
parton level (l3j . Since uncertainties in the standard 
model prediction grow with increasing jet multiplicity, 
up to the four largest pt jets are used to characterize the 
event; any reconstructed jets with py-ordered ranking of 
five or greater are neglected, except in final states with 



small summed scalar transverse momentum containing 
only jets. 

A secondary vertex ^-tagging algorithm is used to iden- 
tify jets likely resulting from the fragmentation of a bot- 
tom quark (b) produced in the hard scattering [141 ]. 

Momentum visible in the detector but not clustered 
into an electron, muon, tau, photon, jet, or &-tagged jet 
is referred to as unclustered momentum (unci). 

Missing momentum (f) is calculated as the negative 
vector sum of the 4- vectors of all identified objects and 
unclustered momentum. An event is said to contain a f> 
object if the transverse momentum of this object exceeds 
17 GeV, and if additional quality criteria discriminating 
against fake missing momentum due to jet mismeasure- 
ment are satisfied 15311 . 



C. Event selection 

Events containing an energetic and isolated electron, 
muon, tau, photon, or jet are selected. A set of three 
level online triggers requires: 

• a central electron candidate with pt > 18 GeV 
passing level 3, with an associated track having 
Pt > 8 GeV and an electromagnetic energy clus- 
ter with pt > 16 GeV at levels 1 and 2; or 

• a central muon candidate with pt > 18 GeV pass- 
ing level 3, with an associated track having pt > 
15 GeV and muon chamber track segments at levels 
1 and 2; or 

• a central or plug photon candidate with pt > 
25 GeV passing level 3, with hadronic to electro- 
magnetic energy less than 1:8 and with energy sur- 
rounding the photon to the photon's energy less 
than 1:7 at levels 1 and 2; or 

• a central or plug jet with pt > 20 GeV passing level 
3, with 15 GeV of transverse momentum required 
at levels 1 and 2, with corresponding prescales of 
50 and 25, respectively; or 

• a central or plug jet with pt > 100 GeV passing 
level 3, with energy clusters of 20 GeV and 90 GeV 
required at levels 1 and 2; or 

• a central electron candidate with pt > 4 GeV and 
a central muon candidate with pt > 4 GeV pass- 
ing level 3, with a muon segment, electromagnetic 
cluster, and two tracks with pt > 4 GeV required 
at levels 1 and 2; or 

• a central electron or muon candidate with pt > 
4 GeV and a plug electron candidate with pt > 
8 GeV, requiring a central muon segment and track 
or central electromagnetic energy cluster and track 
at levels 1 and 2, together with an isolated plug 
electromagnetic energy cluster; or 
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• two central or plug electromagnetic clusters with 
Pt > 18 GeV passing level 3, with hadronic to elec- 
tromagnetic energy less than 1:8 at levels 1 and 2; 
or 

• two central tau candidates with pr > 10 GeV 
passing level 3, each with an associated track hav- 
ing pt > 10 GeV and a calorimeter cluster with 
Pt > 5 GeV at levels 1 and 2. 

Events satisfying one or more of these online triggers 
are recorded for further study. Offline event selection for 
this analysis uses a variety of further filters. Single object 
requirements keep events containing: 

• a central electron with pt > 25 GeV, or 

• a plug electron with pt > 40 GeV, or 

• a central muon with pt > 25 GeV, or 

• a central photon with pt > 60 GeV, or 

• a central jet or 6-tagged jet with pt > 200 GeV, or 

• a central jet or 6-tagged jet with pt > 40 GeV 
(prescaled by a factor of roughly 10 4 ), 

possibly with other objects present. Multiple object cri- 
teria select events containing: 

• two electromagnetic objects (electron or photon) 
with \r]\ < 2.5 and p T > 25 GeV, or 

• two taus with \rj\ < 1.0 and pt > 17 GeV, or 

• a central electron or muon with pt > 17 GeV and 
a central or plug electron, central muon, or central 
tau with pt > 17 GeV, or 

• a central photon with pt > 40 GeV and a central 
electron or muon with pt > 17 GeV, or 

• a central or plug photon with pt > 40 GeV and a 
central tau with pt > 40 GeV, or 

• a central photon with pt > 40 GeV and a central 
6-jet with pt > 25 GeV, or 

• a central jet or 6-tagged jet with pt > 40 GeV and 
a central tau with pt > 17 GeV (prescaled by a 
factor of roughly 10 3 ), or 

• a central or plug photon with pt > 40 GeV and 
two central taus with pt > 17 GeV, or 

• a central or plug photon with pt > 40 GeV and 
two central 6-tagged jets with pt > 25 GeV, or 

• a central or plug photon with pt > 40 GeV, a cen- 
tral tau with pt > 25 GeV, and a central 6-tagged 
jet with p T > 25 GeV, 



possibly with other objects present. Explicit online trig- 
gers feeding this offline selection are required. The pt 
thresholds for these criteria are chosen to be sufficiently 
above the online trigger turn-on curves that trigger effi- 
ciencies can be treated as roughly independent of object 
Pt- 

Good run criteria are imposed, requiring the operation 
of all major subdetectors. To reduce contributions from 
cosmic rays and events from beam halo, standard CDF 
cosmic ray and beam halo filters are applied [l5[ ■ 

These selections result in a sample of roughly two mil- 
lion high-pr data events in an integrated luminosity of 
927 pb" 1 . 



D. Event generation 

Standard model backgrounds are estimated by gener- 
ating a large sample of Monte Carlo events, using the 
Pythia [l^, MadEvent [l7j], and Herwig [H| gener- 
ators. MadEvent performs an exact leading order ma- 
trix element calculation, and provides 4-vector informa- 
tion corresponding to the outgoing legs of the underlying 
Feynman diagrams, together with color flow information. 
Pythia 6.218 is used to handle showering and fragmen- 
tation. The CTEQ5L [n| parton distribution functions 
are used. 

QCD jets. QCD dijet and multijet production are es- 
timated using Pythia. Samples are generated with Tune 
A [20l | with lower cuts on pt , the transverse momentum 
of the scattered partons in the center of momentum frame 
of the incoming partons, of 0, 10, 18, 40, 60, 90, 120, 150, 
200, 300, and 400 GeV. These samples are combined to 
provide a complete estimation of QCD jet production, 
using the sample with greatest statistics in each range of 
Pt- 

"f+jets. The estimation of QCD single prompt pho- 
ton production comes from Pythia. Five samples are 
generated with Tune A corresponding to lower cuts on 
Pt of 8, 12, 22, 45, and 80 GeV. These samples are com- 
bined to provide a complete estimation of single prompt 
photon production in association with one or more jets, 
placing cuts on Pt to avoid double counting. 

"f"f+jets. QCD diphoton production is estimated us- 
ing Pythia. 

V+jets. The estimation of V+jets processes (with V 
denoting W or Z), where the W or Z decays to first or 
second generation leptons, comes from MadEvent, with 
Pythia employed for showering. Tune AW [2(| is used 
within Pythia for these samples. The CKKW matching 
prescription [2l| with a matching scale of 15 GeV is used 
to combine these samples and avoid double counting. Ad- 
ditional statistics are generated on the high-pT tails using 
the MLM matching prescription [22j ■ The factorization 
scale is set to the vector boson mass; the renormalization 
scale for each vertex is set to the pt of the jet. W+4 jets 
are generated inclusively in the number of jets; Z+3 jets 
are generated inclusively in the number of jets. 
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VV+jets. The estimation of WW, WZ, and ZZ pro- 
duction with zero or more jets comes from Pythia. 

Vj+jets. The estimation of W-y and Zj production 
comes from MadEvent, with showering provided by 
Pythia. These samples are inclusive in the number of 
jets. 

W{— > tv) +jets. Estimation of W — > rv with zero or 
more jets comes from Pythia. 

Z{— > Tr)+jets. Estimation of Z — > rr with zero or 
more jets comes from Pythia. 

tt. Top quark pair production is estimated using 
Herwig assuming a top quark mass of 175 GeV and 
NNLO cross section of 6.77 ± 0.42 pb 

Remaining processes, including for example Z{— > 
and Z{— * £ + £~)bb, are generated by systematically 
looping over possible final state partons, using Mad- 
Graph 24j to determine all relevant diagrams, and us- 
ing MadEvent to perform a Monte Carlo integration 
over the final state phase space and to generate events. 
The MLM matching prescription is employed to combine 
samples with different numbers of final state jets. 

A higher statistics estimate of the high-py tails is ob- 
tained by computing the thresholds in ^ px correspond- 
ing to the top 10% and 1% of each process, where J2pt 
denotes the scalar summed transverse momentum of all 
identified objects in an event. Roughly ten times as many 
events are generated for the top 10%, and roughly one 
hundred times as many events are generated for the top 
1%. 

Cosmic rays. Backgrounds from cosmic ray or beam 
halo muons that interact with the hadronic or electro- 
magnetic calorimeters, producing objects that look like a 
photon or jet, are estimated using a sample of data events 
containing fewer than three reconstructed tracks. This 
procedure is described in more detail in Appendix I A 2 al 

Minimum bias. Minimum bias events are overlaid ac- 
cording to run-dependent instantaneous luminosity in 
some of the Monte Carlo samples, including those used 
for inclusive W and Z production. In all samples not 
containing overlaid minimum bias events, including those 
used to estimate QCD dijet production, additional un- 
clustered momentum is added to events to mimic the 
effect of the majority of multiple interactions, in which 
a soft dijet event accompanies the rare hard scattering 
of interest. A random number is drawn from a Gaussian 
centered at with width 1.5 GeV for each of the x and y 
components of the added unclustered momentum. Back- 
grounds due to two rare hard scatterings occurring in the 
same bunch crossing are estimated by forming overlaps 
of events, as described in Appendix IA 2 bl 

Each generated standard model event is assigned a 
weight, calculated as the cross section for the process 
(in units of picobarns) divided by the number of events 
generated for that process, representing the number of 
such events expected in a data sample corresponding to 
an integrated luminosity of 1 pb -1 . When multiplied by 
the integrated luminosity of the data sample used in this 
analysis, the weight gives the predicted number of such 



events in this analysis. 



E. Detector simulation 

The response of the CDF detector is simulated using 
a GEANT-based detector simulation (CdfSim) [25j . with 
GFLASH [2(| used to simulate shower development in the 
calorimeter. 

In pp collisions there is an ordering of frequency with 
which objects of different types are produced: many more 
jets (j) are produced than 6-jets (b) or photons (7), and 
many more of these are produced than charged leptons 
(e, fi, t). The CDF detectors and reconstruction algo- 
rithms have been designed such that the probability of 
misreconstructing a frequently produced object as an in- 
frequently produced object is small. The fraction of cen- 
tral jets that CdfSim misreconstructs as photons, elec- 
trons, and muons is ~ 10~ 3 , ~ 10~ 4 , and ~ 10~ 5 , respec- 
tively. Due to these small numbers, the use of CdfSim 
to model these fake processes would require generating 
samples with prohibitively large statistics. Instead, the 
modeling of a frequently produced object faking a less 
frequently produced object (specifically: j faking 6, 7, e, 
/i, or r; or b or 7 faking e, p, or t) is obtained by the ap- 
plication of a misidentification probability, a particular 
type of correction factor in the Vista correction model, 
described in the next section. 

In Monte Carlo samples passed through CdfSim, re- 
constructed leptons and photons are required to match to 
a corresponding generator level object. This procedure 
removes reconstructed leptons or photons that arise from 
a misreconstructed quark or gluon jet. 



F. Correction model 

Unfortunately some numbers that cannot be deter- 
mined from first principles enter the comparison between 
data and the standard model prediction. These numbers 
are referred to as "correction factors" in the Vista cor- 
rection model. This correction model is applied to gen- 
erated Monte Carlo events to obtain the standard model 
prediction across all final states. 

Correction factors must be obtained from the data 
themselves. These factors may be thought of as Bayesian 
nuisance parameters. The actual values of the correction 
factors are not directly of interest. Of interest is the 
comparison of data to standard model prediction, with 
correction factors adjusted to whatever they need to be, 
consistent with external constraints, to bring the stan- 
dard model into closest agreement with the data. 

The traditional prescription for determining these cor- 
rection factors is to measure them in a control region in 
which no signal is expected. This procedure encounters 
difficulty when the entire high-p^ data sample is consid- 
ered to be a signal region. The approach adopted instead 
is to ask whether a consistent set of correction factors 
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Code 


Category 


Explanation 


Value 


Error 


Error(%) 


0001 


luminosity 


CDF integrated luminosity 


927 


20 


2.2 


0002 


fc-factor 


cosmic 7 


0.69 


0.05 


7.3 


0003 


fc-factor 


cosmic j 


0.446 


0.014 


3.1 


0004 


fc-factor 


I7I7 


0.95 


0.04 


4.2 


0005 


fc-factor 


l 7 2j 


1.2 


0.05 


4.1 


0006 


fc-factor 


lj3j 


1.48 


0.07 


4.7 


0007 


fc-factor 




1.97 


0.16 


8.1 


0008 


fc-factor 


2 7 0j 


1.81 


0.08 


4.4 


0009 


fc-factor 


27l? 


3.42 


0.24 


7.0 


0010 


fc-factor 


2 7 2j+ 


1.3 


0.16 


12.3 


0011 


fc-factor 


WOj 


1.453 


0.027 


1.9 


0012 


fc-factor 


Wlj 


1.06 


0.03 


2.8 


0013 


fc-factor 


W2j 


1.02 


0.03 


2.9 


0014 


fc-factor 


W3j+ 


0.76 


0.05 


6.6 


0015 


fc-factor 


ZOj 


1.419 


0.024 


1.7 


0016 


fc-factor 


zij 


1.18 


0.04 


3.4 


0017 


fc-factor 


Z2j+ 


1.03 


0.05 


4.8 


0018 


fc-factor 


2j, P t < 150 


0.96 


0.022 


2.3 


0019 


fc-factor 


2j, 150 < p T 


1.256 


0.028 


2.2 


0020 


fc-factor 


3j, p T < 150 


0.921 


0.021 


2.3 


0021 


fc-factor 


3j, 150 < p T 


1.36 


0.03 


2.4 


0022 


fc-factor 


Aj, p T < 150 


0.989 


0.025 


2.5 


0023 


fc-factor 


4j, 150 < pt 


1.7 


0.04 


2.3 


0024 


fc-factor 


5j+ 


1.25 


0.05 


4.0 


0025 


ID eff 


p(e^e) central 


0.986 


0.006 


0.6 


0026 


ID eff 


p(e^e) plug 


0.933 


0.009 


1.0 


0027 


ID eff 


p(/"- > M), M < 0.6 


0.845 


0.008 


0.9 


0028 


ID eff 


p([i->n), 0.6 < |r?| 


0.915 


0.011 


1.2 


0029 


ID eff 


p(t^t) central 


0.974 


0.018 


1.8 


0030 


ID eff 


p( 7 -»7) plug 


0.913 


0.018 


2.0 


0031 


ID eff 


p(b^b) central 


1 


0.04 


4.0 


0032 


fake rate 


p(e->7) plug 


0.045 


0.012 


27.0 


0033 


fake rate 


p(q^e) central 


9.71 xl0~ 5 


1.9xl0~ 6 


2.0 


0034 


fake rate 


p(q^e) plug 


0.000876 


1.8xl0" 6 


2.1 


0035 


fake rate 


p{q^l±) 


1.157xl0 -5 


2.7xl0~ 7 


2.3 


0036 


fake rate 


p(j~*b) 


0.01684 


0.00027 


1.6 


0037 


fake rate 


p(q^>r), p T < 60 


0.00341 


0.00012 


3.5 


0038 


fake rate 


p(q— >t), 60 < pt 


0.00038 


4xl0~ 5 


10.5 


0039 


fake rate 


p{q^l) central 


0.000265 


1.5xl0" s 


5.7 


0040 


fake rate 


p(g->7) plug 


0.00159 


0.00013 


8.2 


0041 


trigger 


p(e— >trig) central, pr > 25 


0.976 


0.007 


0.7 


0042 


trigger 


p(e^trig) plug, p T > 25 


0.835 


0.015 


1.8 


0043 


trigger 


p(ji-> trig) \ri\ < 0.6, p T > 25 


0.917 


0.007 


0.8 


0044 


trigger 


p(/tt-»trig) 0.6 < \r)\ < 1.0, p T > 25 0.96 


0.01 


1.0 



TABLE I: The 44 correction factors introduced in the VlSTA correction model. The leftmost column (Code) shows correction 
factor codes. The second column (Category) shows correction factor categories. The third column (Explanation) provides a 
short description. The correction factor best fit value (Value) is given in the fourth column. The correction factor error (Error) 
resulting from the fit is shown in the fifth column. The fractional error (Error (50) is listed in the sixth column. All values 
are dimensionless with the exception of code 0001 (luminosity), which has units of pb" 1 . The values and uncertainties of these 
correction factors are valid within the context of this correction model. 
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can be chosen so that the standard model prediction is 
in agreement with the CDF high-p^ data. 

The correction model is obtained by an iterative pro- 
cedure informed by observed inadequacies in modeling. 
The process of correction model improvement, motivated 
by observed discrepancies, may allow a real signal to be 
artificially suppressed. If adjusting correction factor val- 
ues within allowed bounds removes a signal, then the 
case for the signal disappears, since it can be explained 
in terms of known physics. This is true in any analysis. 
The stronger the constraints on the correction model, the 
more difficult it is to artificially suppress a real signal. By 
requiring a consistent interpretation of hundreds of final 
states, Vista is less likely to mistakenly explain away 
new physics than if it had more limited scope. 

The 44 correction factors currently included in the 
Vista correction model are shown in Table fl] These 
factors can be classified into two categories: theoretical 
and experimental. A more detailed description of each 
individual correction factor is provided in Appendix IA 41 

Theoretical correction factors reflect the practical dif- 
ficulty of calculating accurately within the framework 
of the standard model. These factors take the form 
of fc-factors, so-called "knowledge factors," represent- 
ing the ratio of the unavailable all order cross section 
to the calculable leading order cross section. Twenty- 
three fc-factors are used for standard model processes in- 
cluding QCD multijet production, W+jets, Z+jets, and 
(di)photon+jets production. 

Experimental correction factors include the integrated 
luminosity of the data, efficiencies associated with trig- 
gering on electrons and muons, efficiencies associated 
with the correct identification of physics objects, and 
fake rates associated with the mistaken identification of 
physics objects. Obtaining an adequate description of 
object misidentification has required an understanding 
of the underlying physical mechanisms by which objects 
are misreconstructed, as described in Appendix I A 11 

In the interest of simplicity, correction factors repre- 
senting fc-factors, efficiencies, and fake rates are generally 
taken to be constants, independent of kinematic quanti- 
ties such as object px, with only five exceptions. The px 
dependence of three fake rates is too large to be treated 
as approximately constant: the jet faking electron rate 
PU~ >e ) m tne P m S re gi° n 01 the CDF detector; the jet 
faking 6-tagged jet rate p{j — > b) , which increases steadily 
with increasing px] and the jet faking tau rate p{j^r), 
which decreases steadily with increasing px- Two other 
fake rates possess geometrical features in r)-<j> due to the 
construction of the CDF detector: the jet faking electron 
rate p(j—*e) in the central region, because of the fidu- 
cial tower geometry of the electromagnetic calorimeter; 
and the jet faking muon rate p(j— due to the non- 
trivial fiducial geometry of the muon chambers. After 
determining appropriate functional forms, a single over- 
all multiplicative correction factor is used. 

Correction factor values are obtained from a global fit 
to the data. The procedure is outlined here, with further 



details relegated to Appendix IA 31 

Events are first partitioned into final states according 
to the number and types of objects present. Each final 
state is then subdivided into bins according to each ob- 
ject's detector pseudorapidity (r?det) and transverse mo- 
mentum (pt), as described in Appendix I A 3 al 

Generated Monte Carlo events, adjusted by the cor- 
rection model, provide the standard model prediction for 
each bin. The standard model prediction in each bin is 
therefore a function of the correction factor values. A 
figure of merit is defined to quantify global agreement 
between the data and the standard model prediction, 
and correction factor values are chosen to maximize this 
agreement, consistent with external experimental con- 
straints. 

Letting s represent a vector of correction factors, for 
the fc th bin 

2 (Data[fc]-SM[fc]) 2 

Xk\ s ) — — 2 ' y L > 

ySM[fcJ +<5SM[/c] 2 

where Data[fc] is the number of data events observed in 
the fc th bin, SM[fc] is the number of events predicted by 
the standard model in the k th bin, <5SM[fc] is the Monte 
Carlo statistical uncertainty on the standard model pre- 
diction in the fc th bin [54| , and ^/SM[/c] is the statistical 
uncertainty on the expected data in the k th bin. The 
standard model prediction SM[/c] in the fc th bin is a func- 
tion of s. 

Relevant information external to the Vista high-px 
data sample provides additional constraints in this global 
fit. The CDF luminosity counters measure the inte- 
grated luminosity of the sample described in this ar- 
ticle to be 902 pb _1 ± 6% by measuring the fraction 
of bunch crossings in which zero inelastic collisions oc- 
cur [27J ■ The integrated luminosity of the sample mea- 
sured by the luminosity counters enters in the form of 
a Gaussian constraint on the luminosity correction fac- 
tor. Higher order theoretical calculations exist for some 
standard model processes, providing constraints on cor- 
responding fc-factors, and some CDF experimental cor- 
rection factors are also constrained from external infor- 
mation. In total, 26 of the 44 correction factors are con- 
strained. The specific constraints employed are provided 
in Appendix I A 3 bl 

The overall function to be minimized takes the form 

X 2 (s)={ Xfe(s) j +Xconstrai„ts(s) I ( 2 ) 

VfcEbins / 

where the sum in the first term is over bins in the CDF 
high-j»T data sample with x|(s) defined in Eq.[l] and the 
second term is the contribution from explicit constraints. 

Minimization of x 2 (s) in Eq. |2] as a function of the 
vector of correction factors s results in a set of correction 
factor values so providing the best global agreement be- 
tween the data and the standard model prediction. The 
best fit correction factor values are shown in Table Q] to- 
gether with absolute and fractional uncertainties. The 
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determined uncertainties are not used explicitly in the 
subsequent analysis, but rather provide information used 
implicitly to assist in appropriate adjustment to the cor- 
rection model in light of observed discrepancies. The 
uncertainties are verified by subdividing the data into 
thirds, performing separate fits on each third, and not- 
ing that the correction factor values obtained with each 
subset are consistent within quoted uncertainties. Fur- 
ther details on the correlation matrix and other technical 
aspects of this global fit can be found in Appendix I A 3 cl 

Although the correction factors are determined from a 
global fit, in practice the determination of many correc- 
tion factors' values are dominated by one recognizable 
subsample. The rate >e) for a jet to fake an elec- 
tron is determined largely by the number of events in the 
ej final state, since the largest contribution to this final 
state is from dijet events with one jet misreconstructed 
as an electron. Similarly, the rates >&) and p(j — >r) 
for a jet to fake a 6-tagged jet and tau lepton are deter- 
mined largely by the number of events in the bj and rj 
final states, respectively. The determination of the fake 
rate p(j — »7), photon efficiency p(j— *7), and k- factors 
for prompt photon production and prompt diphoton pro- 
duction are dominated by the jj, jjj, and 77 final states. 
Additional knowledge incorporated in the determination 
of fake rates is described in Appendix I A II 

The global fit x 2 per number of bins is 288.1/133+27.9, 
where the last term is the contribution to the x 2 from the 
imposed constraints. A \ 2 per degree of freedom larger 
than unity is expected, since the limited set of correction 
factors in this correction model is not expected to provide 
a complete description of all features of the data. Em- 
phasis is placed on individual outlying discrepancies that 
may motivate a new physics claim, rather than overall 
goodness of fit. 

Corrections to object identification efficiencies are typ- 
ically less than 10%; fake rates are consistent with an 
understanding of the underlying physical mechanisms re- 
sponsible; fc-factors range from slightly less than unity to 
greater than two for some processes with multiple jets. 
All values obtained are physically reasonable. Further 
analysis is provided in Appendix I A 41 

With the details of the correction model in place, the 
complete standard model prediction can be obtained. For 
each Monte Carlo event after detector simulation, the 
event weight is multiplied by the value of the luminosity 
correction factor and the fc-factor for the relevant stan- 
dard model process. The single Monte Carlo event can 
be misreconstructed in a number of ways, producing a 
set of Monte Carlo events derived from the original, with 
weights multiplied by the probability of each misrecon- 
struction. The weight of each resulting event is multiplied 
by the probability the event satisfies trigger criteria. The 
resulting standard model prediction, corrected as just de- 
scribed, is referred to as "the standard model prediction" 
throughout the rest of this paper, with "corrected" im- 
plied in all cases. 
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FIG. 1: Distribution of observed discrepancy between data 
and the standard model prediction, measured in units of stan- 
dard deviation (a), shown as the solid (green) histogram, be- 
fore accounting for the trials factor. The upper pane shows 
the distribution of discrepancies between the total number 
of events observed and predicted in the 344 populated fi- 
nal states considered. Negative values on the horizontal axis 
correspond to a deficit of data compared to standard model 
prediction; positive values indicate an excess of data com- 
pared to standard model prediction. The lower pane shows 
the distribution of discrepancies between the observed and 
predicted shapes in 16,486 kinematic distributions. Distribu- 
tions in which the shapes of data and standard model predic- 
tion are in relative disagreement correspond to large positive 
a. The solid (black) curves indicate expected distributions, 
if the data were truly drawn from the standard model back- 
ground. Interest is focused on the entries in the tails of the 
upper distribution and the high tail of the lower distribution. 
The final state entering the upper histogram at — 4.03<r is the 
Vista 3j r final state, which heads Table HTl Most of the dis- 
tributions entering the lower histogram with > 4<r derive from 
the 3j Ai?(j2,j3) discrepancy, discussed in the text. 
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TABLE II: A subset of the VlSTA comparison between Tevatron Run II data and standard model prediction, showing the twenty 
most discrepant final states and all final states populated with ten or more data events. Events are partitioned into exclusive 
final states based on standard CDF object identification criteria. Final states are labeled in this table according to the number 
and types of objects present, and whether (high X^Pt) or not (low ^2pt) the summed scalar transverse momentum of all 
objects in the events exceeds 400 GeV, for final states not containing leptons or photons. Final states are ordered according to 
decreasing discrepancy between the total number of events expected, taking into account the error from Monte Carlo statistics 
and the total number observed in the data. Final states exhibiting mild discrepancies are shown together with the significance 
of the discrepancy in units of standard deviations (a) after accounting for a trials factor corresponding to the number of final 
states considered. Final states that do not exhibit even mild discrepancies are listed below the horizontal line in inverted 
alphabetical order. Only Monte Carlo statistical uncertainties on the background prediction are included. 
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FIG. 2: The invariant mass of the tau lepton and two leading 
jets in the final state consisting of three jets and one positively 
or negatively charged tau. (The VlSTA final state naming 
convention gives the tau lepton a positive charge.) Data are 
shown as filled (black) circles, with the standard model pre- 
diction shown as the shaded (red) histogram. This is the most 
discrepant kinematic distribution in the final state exhibiting 
the largest population discrepancy. 



G. Results 

Data and standard model events are partitioned into 
exclusive final states. This partitioning is orthogonal, 
with each event ending up in one and only one final state. 
Data are compared to standard model prediction in each 
final state, considering the total number of events ob- 
served and predicted, and the shapes of relevant kine- 
matic distributions. 

In a data driven search, it is crucial to explicitly ac- 
count for the trials factor, quantifying the number of 
places an interesting signal could appear. Fluctuations 
at the level of three or more standard deviations are ex- 
pected to appear simply because a large number of re- 
gions are considered. A reasonably rigorous accounting 
of this trials factor is possible as long as the measures 
of interest and the regions to which these measures are 
applied are specified a priori, as is done here. In this 
analysis a discrepancy at the level of 3a or greater after 
accounting for the trials factor (typically corresponding 
to a discrepancy at the level of 5a or greater before ac- 
counting for the trials factor) is considered "significant." 

Discrepancy in the total number of events in a fi- 
nal state (fs) is measured by the Poisson probability 
Pi s that the number of predicted events would fluctu- 
ate up to or above (or down to or below) the number of 
events observed. To account for the trials factor due 
to the 344 VlSTA final states examined, the quantity 
p = 1 — (1 — pf s ) 344 is calculated for each final state. 
The result is the probability p of observing a discrepancy 
corresponding to a probability less than pf s in the total 
sample studied. This probability p can then be converted 



a population discrepancy greater than 3a after the trials 
factor is thus accounted for is considered significant. 

Many kinematic distributions are considered in each 
final state, including the transverse momentum, pseudo- 
rapidity, detector pseudorapidity, and azimuthal angle of 
all objects, masses of individual jets and 6-jets, invari- 
ant masses of all object combinations, transverse masses 
of object combinations including fi, angular separation 
A<f> and AR of all object pairs, and several other more 
specialized variables. A Kolmogorov-Smirnov (KS) test 
is used to quantify the difference in shape of each kine- 
matic distribution between data and standard model pre- 
diction. As with populations, a trials factor is assessed 
to account for the 16,486 distributions examined, and the 
resulting probability is converted into units of standard 
deviations. A distribution with KS statistic greater than 
0.02 and probability corresponding to greater than 3a 
after assessing the trials factor is considered significant. 

Table HI] shows a subset of the VlSTA comparison of 
data to standard model prediction. Shown are all final 
states containing ten or more data events, with the most 
discrepant final states in population heading the list. Af- 
ter accounting for the trials factor, no final state has a 
statistically significant (> 3a) population discrepancy. 
The most discrepant final state (3j t^) contains 71 data 
events and 113. 7± 3.6 events expected from the standard 
model. The Poisson probability for 113.7 ± 3.6 expected 
events to result in 71 or fewer events observed in this 
final state is 2.8 x 10~ 5 , corresponding to an entry at 
—4.03(7 in Fig. Q] The probability for one or more of the 
344 populated final states considered to display disagree- 
ment in population corresponding to a probability less 
than 2.8 x 10 -5 is 1%. The 3j population discrepancy 
is thus not statistically significant. The most discrepant 
kinematic distribution in this final state is the invariant 
mass of the tau lepton and the two highest transverse 
momentum jets, shown in Fig. [5] 

The six final states with largest population discrepancy 
are 3jr, 5j, 2jr, 2j 2r, bej, and the low-p-p 3j final 
state, with bej being the only one of these six to exhibit 
an excess of data. The 3j r, 2j r, and 2j 2r final states 
appear to reflect an incomplete understanding of the rate 
of jets faking taus (p(j—>-T)) as a function of the number 
of jets in the event, at the level of ~ 30% difference be- 
tween the total number of observed and predicted events 
in the most populated of these final states. The value of 
p(j — > t) is primarily determined by the j t final state. In- 
terestingly, although the underlying physical mechanism 
for p(j^e) is very similar to that for p(J— >t), as dis- 
cussed in Appendix IA 11 a significant dependence on the 
presence of additional jets is not observed for p(j — >e). 

The 5j discrepancy results from a tension with the e 4j 
final state, whose dominant contribution comes from 5j 
production convoluted with p(j— *e). The \ow-px 3j dis- 
crepancy results from a tension with the e 2j final state, 
whose dominant contribution comes from 3j production 
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FIG. 3: A shape discrepancy highlighted by VlSTA in the fi- 
nal state consisting of exactly three reconstructed jets with 
\rf\ < 2.5 and pr > 17 GeV, and with one of the jets satis- 
fying \rj\ < 1 and pr > 40 GeV. This distribution illustrates 
the effect underlying most of the VlSTA shape discrepancies. 
Filled (black) circles show CDF data, with the shaded (red) 
histogram showing the prediction of Pythia. The discrep- 
ancy is clearly statistically significant, with statistical error 
bars smaller than the size of the data points. The vertical 
axis shows the number of events per bin, with the horizon- 
tal axis showing the angular separation (AR = \J Ar\ 2 + 5(f)' 2 ) 
between the second and third jets, where the jets are ordered 
according to decreasing transverse momentum. In the region 
AiZ(j2,j3) > 2, populated primarily by initial state radiation, 
the standard model prediction can to some extent be adjusted. 
The region AR(j2,ja) < 2 is dominated by final state radi- 
ation, the description of which is constrained by data from 
LEP 1. 



convoluted with p(j—*e). The bej final state is pre- 
dominantly 3j production convoluted with p(J — ► b) and 
>e); this discrepancy also arises from a tension with 
the \ow~pt 3j and e 2j final states. The bej final state is 
the Vista final state in which the largest excess of data 
over standard model prediction is seen. The fraction of 
hypothetical similar CDF experiments that would pro- 
duce a VlSTA normalization excess as significant as the 
excess observed in this final state is 8%. The 5j, bej, 
and 1ow-pt 3j discrepancies correspond to a difference of 
~ 10% between the total number of observed and pre- 
dicted events in these final states. 

Figure [1] summarizes in a histogram the measured dis- 
crepancies between data and the standard model predic- 
tion for CDF high-py final state populations and kine- 
matic distributions. Values in this figure represent in- 
dividual discrepancies, and do not account for the trials 
factor associated with examining many possibilities. 

Of the 16,484 kinematic distributions considered, 384 
distributions are found to correspond to a discrepancy 
greater than 3u after accounting for the trials factor, en- 
tering with a KS probability of roughly 5<r or greater in 
Fig. [TJ Of these 384 discrepant distributions, 312 are at- 
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FIG. 4: The jet mass distribution in the bj final state with 
~}2pt > 400 GeV. The 3j AR(j2,j3) discrepancy illustrated 
in Fig. [3] manifests itself also by producing jets more massive 
in data than predicted by Pythia's showering algorithm. The 
mass of a jet is determined by treating energy deposited in 
each calorimeter tower as a massless 4-vector, summing the 
4- vectors of all towers within the jet, and computing the mass 
of the resulting (massive) 4-vector. 
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FIG. 5: The distribution of AR between the jet and fe-tagged 
jet in the final state bej. The primary standard model con- 
tribution to this final state is QCD three jet production with 
one jet misreconstructed as an electron. The similarity to the 
3j AR{j2,js) discrepancy illustrated in Fig. [3] in the region 
AR(j, b) < 2 is clear. Less clear is the underlying explana- 
tion for the difference with respect to Fig. [3] in the region 
AR(j,b)>2. 



tributed to modeling parton radiation, deriving from the 
3j AR{j 2l jz) discrepancy shown in Fig. O with 186 of 
these 312 shape discrepancies pointing out that individ- 
ual jet masses are larger in data than in the prediction, as 
shown in Fig.UJ A careful reading of the literature reveals 
that the same effect was observed (but not emphasized) 
by both CDF (Hl^ and D0 in Tevatron Run I. The 
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3j AR(j2, J3) and jet mass discrepancies appear to be two 
different views of a single underlying discrepancy, noting 
that two sufficiently nearby distinct jets correspond to a 
pattern of calorimetric energy deposits similar to a single 
massive jet. The underlying 3j AR(j2, J3) discrepancy is 
manifest in many other final states. The final state be j, 
arising primarily from QCD production of three jets with 
one misreconstructed as an electron, shows a similar dis- 
crepancy in AR(j, b) in Fig. [5] 

While these discrepancies are clearly statistically sig- 
nificant, basing a new physics claim around them is diffi- 
cult. In the kinematic regime of the discrepancy, different 
algorithms to match exact leading order calculations with 
a parton shower lead to different predictions [3l[ . Newer 
predictions have not been systematically compared to 
LEP 1 data, which provide constraints on parton show- 
ering reflected in Pythia's tuning. Further investigation 
into obtaining an adequate QCD-based description of this 
discrepancy continues. 

An additional 59 discrepant distributions reflect an 
inadequate modeling of the overall transverse boost of 
the system. The overall transverse boost of the primary 
physics objects in the event is attributed to two sources: 
the intrinsic Fermi motion of the colliding partons within 
the proton, and soft or collinear radiation of the collid- 
ing partons as they approach collision. Together these 
effects are here referred to as "intrinsic kx" representing 
an overall momentum kick to the hard scattering. Fur- 
ther discussion appears in Appendix I A 2 cl 

The remaining 13 discrepant distributions are seen to 
be due to the coarseness of the Vista correction model. 
Most of these discrepancies, which are at the level of 
10% or less when expressed as (data — theory) / theory, 
arise from modeling most fake rates as independent of 
transverse momentum. 

In summary, this global analysis of the bulk features 
of the high-pr data has not yielded a discrepancy mo- 
tivating a new physics claim. There are no statistically 
significant population discrepancies in the 344 populated 
final states considered, and although there are several 
statistically significant discrepancies among the 16,486 
kinematic distributions investigated, the nature of these 
discrepancies makes it difficult to use them to support a 
new physics claim. 

This global analysis of course cannot conclude with 
certainty that there is no new physics hiding in the CDF 
data. The Vista population and shape statistics may be 
insensitive to a small excess of events appearing at large 
^2pt in a highly populated final state. For such signals 
another algorithm is required. 



IV. SLEUTH 

Taking a broad view of all proposed models that might 
extend the standard model, a profound commonality is 
noted: nearly all predict an excess of events at high pr, 
concentrated in a particular final state. The second stage 



of this research program involves the systematic search 
for such physics using an algorithm called Sleuth [32| . 
Sleuth is quasi model independent, where "quasi" refers 
to the assumption that the first sign of new physics will 
appear as an excess of events in some final state at large 
summed scalar transverse momentum Q2pt)- 

The Sleuth algorithm used by CDF in Tevatron Run 
II is essentially that developed by D0 in Tevatron Run 
I [33l [313. [35l | . and subsequently improved by HI in HERA 
Run I [361 ] , with small modifications. 

Sleuth's definition of interest relies on the following 
assumptions. 

1. The data can be categorized into exclusive final 
states in such a way that any signature of new 
physics is apt to appear predominantly in one of 
these final states. 

2. New physics will appear with objects at high 
summed transverse momentum Q2pr) relative to 
standard model and instrumental background. 

3. New physics will appear as an excess of data over 
standard model and instrumental background. 



A. Algorithm 

The Sleuth algorithm consists of three steps, follow- 
ing the above three assumptions. 



1. Final states 

In the first step of the algorithm, all events are placed 
into exclusive final states as in Vista, with the following 
modifications. 

• Jets are identified as pairs, rather than individu- 
ally, to reduce the total number of final states and 
to keep signal events with one additional radiated 
gluon within the same final state. Final state names 
include "n jj" if n jet pairs are identified, with pos- 
sibly one unpaired jet assumed to have originated 
from a radiated gluon. 

• The present understanding of quark flavor suggests 
that b quarks should be produced in pairs. Bottom 
quarks are identified as pairs, rather than individu- 
ally, to increase the robustness of identification and 
to reduce the total number of final states. Final 
state names include "n &&" if n b pairs are identi- 
fied. 

• Final states related through global charge conjuga- 
tion are considered to be equivalent. Thus e + e~7 
is a different final state than e + e + j, but e + e + 7 
and e~e~7 together make up a single Sleuth final 
state. 
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• Final states related through global interchange of 
the first and second generation are considered to be 
equivalent. Thus e + ^7 and /i + ^7 together make 
up a single Sleuth final state. The decision to 
consider third generation objects (b quarks and r 
leptons) differently from first and second generation 
objects reflects theoretical prejudice that the third 
generation may be special, and the experimental 
ability (in the case of b quarks) and experimental 
challenge (in the case of r leptons) in the identifi- 
cation of third generation objects. 

The symbol I is used to denote electron or muon. The 
symbol W is used in naming final states containing one 
electron or muon, significant missing momentum, and 
perhaps other non-leptonic objects. Thus the final states 
e + ^7, e"^7, /i + ^7, and /i~~^7 are combined into the 
Sleuth final state VF7. A table showing the relation- 
ship between Vista and Sleuth final states is provided 
in Appendix IB II 



2. Variable 

The second step of the algorithm considers a single 
variable in each exclusive final state: the summed scalar 
transverse momentum of all objects in the event Q^Pt)- 
Assuming momentum conservation in the plane trans- 
verse to the axis of the colliding beams, 



^2 Pi + unc l + i> = 0, 



(3) 



where the sum over i represents a sum over all identified 
objects in the event, the i th object has momentum pi, 
unci denotes the vector sum of all momentum visible in 
the detector but not clustered into an identified object, 
j) denotes the missing momentum, and the equation is a 
two-component vector equality for the components of the 
momentum along the two spatial directions transverse to 
the axis of the colliding beams. The Sleuth variable 
^2pt is then defined by 



unci 



(4) 



where only the momentum components transverse to the 
axis of the colliding beams are considered when comput- 
ing magnitudes. 



3. Regions 

The algorithm's third step involves searching for re- 
gions in which more events are seen in the data than 
expected from standard model and instrumental back- 
ground. This search is performed in the variable space 
defined in the second step of the algorithm, for each of 
the exclusive final states defined in the first step. 

The steps of the search can be sketched as follows. 



In each final state, the regions considered are the 
one dimensional intervals in ^2pt extending from 
each data point up to infinity. A region is required 
to contain at least three data events, as described 
in Appendix [Bl 



• In a particular final state, the data point with the 
d th largest value of Pt defines an interval in the 
variable ^2pt extending from this data point up 
to infinity. This semi-infinite interval contains d 
data events. The standard model prediction in this 
interval, estimated from the Vista comparison de- 
scribed above, integrates to b predicted events. In 
this final state, the interest of the <i th region is de- 
fined as the Poisson probability pd = Y^iLd l\ e ~ b 
that the standard model background b would fluc- 
tuate up to or above the observed number of data 
events d in this region. The most interesting region 
in this final state is the one with smallest Poisson 
probability. 



• For this final state, pseudo experiments are gener- 
ated, with pseudo data pulled from the standard 
model background. For each pseudo experiment, 
the interest of the most interesting region is calcu- 
lated. An ensemble of pseudo experiments deter- 
mines the fraction V of pseudo experiments in this 
final state in which the most interesting region is 
more interesting than the most interesting region 
in this final state observed in the data. If there is 
no new physics in this final state, V is expected to 
be a random number pulled from a uniform distri- 
bution in the unit interval. If there is new physics 
in this final state, V is expected to be small. 



• Looping over all final states, V is computed for each 
final state. The minimum of these values is denoted 
T^min- The most interesting region in the final state 
with smallest V is denoted 1Z. 



• The interest of the most interesting region TZ in 
the most interesting final state is defined by V = 
1~ rL(l — P«)> wnere the product is over all Sleuth 
final states a, and p a is the lesser of V m i n and the 
probability for the total number of events predicted 
by the standard model in the final state a to fluc- 
tuate up to or above three data events. The quan- 
tity V represents the fraction of hypothetical sim- 
ilar CDF experiments that would produce a final 
state with V < V m i n - The range of V is the unit 
interval. If the data are distributed according to 
standard model prediction, V is expected to be a 
random number pulled from a uniform distribution 
in the unit interval. If new physics is present, V is 
expected to be small. 
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4- Output 

The output of the algorithm is the most interesting re- 
gion 1Z observed in the data, and a number V quantifying 
the interest of 1Z. A reasonable threshold for discovery is 
V < 0.001, which corresponds loosely to a local ha effect 
after the trials factor is accounted for. 

Although no integration over systematic errors is per- 
formed in computing V, systematic uncertainties do af- 
fect the final Sleuth result. If Sleuth highlights a dis- 
crepancy in a particular final state, explanations in terms 
of a correction to the background estimate are consid- 
ered. This process necessarily requires physics judge- 
ment. A reasonable explanation of a Sleuth discrep- 
ancy in terms of an inadequacy in the modeling of the 
detector response or standard model prediction that is 
consistent with external information is fed back into the 
Vista correction model and tested for global consistency. 
In this way, plausible explanations for discrepancies ob- 
served by Sleuth are incorporated into the Vista cor- 
rection model. This iteration continues until either all 
reasonable explanations for a significant Sleuth discrep- 
ancy are exhausted, resulting in a possible new physics 
claim, or no significant Sleuth discrepancy remains. 



B. Sensitivity 

Two important questions must be asked: 

• Will Sleuth find nothing if there is nothing to be 
found? 

• Will Sleuth find something if there is something 
to be found? 

If there is nothing to be found, Sleuth will find noth- 
ing 999 times out of 1000, given a uniform distribution of 
V and a discovery threshold of V < 0.001. The uniform 
distribution of V in the absence of new physics is illus- 
trated in Fig. [SI using values of V obtained in pseudo ex- 
periments with pseudo data generated from the standard 
model prediction. Sleuth will of course return spuri- 
ous signals if provided improperly modeled backgrounds. 
The algorithm directly addresses the issue of whether an 
observed hint is due to a statistical fluctuation. Sleuth 
itself is unable to address systematic mismeasurement or 
incorrect modeling, but quite useful in bringing these to 
attention. 

The answer to the second question depends to what de- 
gree the new physics satisfies the three assumptions on 
which Sleuth is based: new physics will appear predom- 
inantly in one final state, at high summed scalar trans- 
verse momentum, and as an excess of data over standard 
model prediction. Sleuth's sensitivity to any particular 
new phenomenon depends on the extent to which this 
new phenomenon satisfies these assumptions. 
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FIG. 6: Distribution of 10 3 V values from 10 3 CDF pseudo ex- 
periments, in which pseudo data are pulled from the standard 
model prediction. The distribution of V is shown in the unit 
interval (upper), with one entry for each of the CDF pseudo 
experiments. The distribution of V translated into units of 
standard deviations is also shown (lower). The distribution 
of V from pseudo experiments is consistent with flat (upper), 
and consistent with a Gaussian when translated into units of 
standard deviations (lower), as expected. 



1. Known standard model processes 

Consideration of specific standard model processes can 
provide intuition for Sleuth's sensitivity to new physics. 
This section tests Sleuth's sensitivity to the production 
of top quark pairs, W boson pairs, single top, and the 
Higgs boson. 

a. Top quark pairs. Top quark pair production re- 
sults in two b jets and two W bosons, each of which may 
decay leptonically or hadronically. The W branching ra- 
tios are such that this signal predominantly populates 
the Sleuth final state Wbbjj, where 'W denotes an 
electron or muon and significant missing momentum. Al- 
though the final states -fibb were important in verify- 
ing the top quark pair production hypothesis in the initial 
observation by CDF [5j] and D0 |6fl in 1995, most of the 
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FIG. 7: (Top left) The Sleuth final state bb£ + £'~ consisting of events with one electron and one muon of opposite sign, 
missing momentum, and two or three jets, one or two of which are 6-tagged. Data corresponding to 927 pb _1 are shown as 
filled (black) circles; the standard model prediction is shown as the (red) shaded histogram. (Top right) The same final state 
with tt subtracted from the standard model prediction. (Bottom row) The Sleuth final state Wbbjj, with the standard model 
tt contribution included (lower left) and removed (lower right). Significant discrepancies far surpassing Sleuth's discovery 
threshold are observed in these final states with ti removed from the standard model background estimate. If the top quark 
had not been predicted, Sleuth would have discovered it. 



statistical power came from the final state Wbbjj. The 
all hadronic decay final state bb 4j has only convincingly 
been seen after integrating substantial Run II luminos- 
ity [37[. Sleuth's first assumption that new physics will 
appear predominantly in one final state is thus reason- 
ably well satisfied. Since the top quark has a mass of 
170.9 ± 1.8 GeV [Hj], the production of two such objects 
leads to a signal at large J2pt relative to the standard 
model background of W bosons produced in association 
with jets, satisfying Sleuth's second and third assump- 
tions. Sleuth is expected to perform reasonably well on 
this example. 

To quantitatively test Sleuth's sensitivity to top 



quark pair production, this process is removed from the 
standard model prediction, and the values of the Vista 
correction factors are re-obtained from a global fit as- 
suming ignorance of tt production. Sleuth easily dis- 
covers tt production in 927 pb _1 in the final states 
bb£ + £'~ f and Wbbjj, shown in Fig. [7] Sleuth finds 

TbU+i'-j, < 1-5 x 10 ~ 8 and 'Pwbbjj < 8 - 3 x 10_7 : far 
surpassing the discovery threshold of V < 0.001. 

The test is repeated as a function of assumed inte- 
grated luminosity, and Sleuth is found to highlight the 
top quark signal at an integrated luminosity of roughly 
80 ± 60 pb _1 , where the large variation arises from sta- 
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FIG. 8: (Left) The final state £ + £'~ ft, consisting of events with an electron and muon of opposite sign and missing transverse 
momentum, in 927 pb _1 of CDF data. (Right) The same final state with standard model WW, WZ, and ZZ contributions 
subtracted, and with the VlSTA correction factors re-fit in the absence of these contributions. Sleuth finds the final state 
ft to contain a discrepancy surpassing the discovery threshold of V < 0.001 with the processes WW, WZ, and ZZ 
removed from the standard model background. 



tistical fluctuations in the tt signal events. Weaker con- 
straints on the Vista correction factors at lower inte- 
grated luminosity marginally increase the integrated lu- 
minosity required to claim a discovery. 

b. W boson pairs. The sensitivity to standard 
model WW production is tested by removing this pro- 
cess from the standard model background prediction and 
allowing the VlSTA correction factors to be re- fit. In 
927 pb" 1 of Tevatron Run II data, Sleuth identifies an 
excess in the final state ft, consisting of an electron 
and muon of opposite sign and missing momentum. This 
excess corresponds to V < 2 x 10~ 4 , sufficient for the 
discovery of WW, as shown in Fig. [5] 

c. Single top. Single top quarks are produced 
weakly, and predominantly decay to populate the 
Sleuth final state Wbb, satisfying Sleuth's first as- 
sumption. Single top production will appear as an ex- 
cess of events, satisfying Sleuth's third assumption. 
Sleuth's second assumption is not well satisfied for 
this example, since single top production does not lie at 
large J^Pt relative to other standard model processes. 
Sleuth is thus expected to be outperformed by a tar- 
geted search in this example. 

d. Higgs boson. Assuming a standard model Higgs 
boson of mass = 115 GeV, the dominant observable 
production mechanism is pp — > Wh and pp — > Zh, popu- 
lating the final states Wbb, l + l~bb, and ftbb. The signal 
is thus spread over three Sleuth final states. Events in 
the last of these ( ft bb) do not pass the VlSTA event selec- 
tion, which does not use ft as a trigger object. Sleuth's 
first assumption is thus poorly satisfied for this exam- 
ple. The standard model Higgs boson signal will appear 
as an excess, but as in the case of single top production 



it does not appear at particularly large J^pr relative 
to other standard model processes. Since the standard 
model Higgs boson poorly satisfies Sleuth's first and 
second assumptions, a targeted search for this specific 
signal is expected to outperform Sleuth. 



2. Specific models of new physics 

To build intuition for Sleuth's sensitivity to new 
physics signals, several sensitivity tests are conducted for 
a variety of new physics possibilities. Some of the new 
physics models chosen have already been considered by 
more specialized analyses within CDF, making possible 
a comparison between Sleuth's sensitivity and the sen- 
sitivity of these previous analyses. 

Sleuth's sensitivity can be compared to that of a ded- 
icated search by determining the minimum new physics 
cross section cr m i n required for a discovery by each. The 
discovery for Sleuth occurs when V < 0.001. In most 
Sleuth regions satisfying the discovery threshold of 
V < 0.001, the probability for the predicted number of 
events to fluctuate up to or above the number of events 
observed corresponds to greater than 5a. The discov- 
ery for the dedicated search occurs when the observed 
excess of data corresponds to a 5ct effect. Smaller tr m i n 
corresponds to greater sensitivity. 

The sensitivity tests are performed by first generating 
pseudo data from the standard model background pre- 
diction. Signal events for the new physics model are gen- 
erated, passed through the chain of CDF detector simu- 
lation and event reconstruction, and consecutively added 
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Model 



Description 



Sensitivity 



GMSB, A = 82.6 GeV, tan/3 = 15, 
H > 0, with one messenger of M = 
2A. 









0.02 0.64 0.6c 


0.08 


071 0.12 0.14 0.16 


18 0.2 0.22 
° mi „ (Pb) 



Z' -f f+r, m z > = 250 GeV, with 
standard model couplings to lep- 
tons. 









1 


1.2 1 


4 


1.8 

°min (P b ) 



Z' -> qg, m z , = 700 GeV, 
with standard model couplings to 
quarks. 









3 3.5 


4 4.5 5 


5.5 



,(pb) 



Z' — * qq, m z i — 1 TeV, with stan- 
dard model couplings to quarks. 









1.3 1.4 


1.5 1.6 1.7 1.8 1. 


3 2 2.1 



,(pb) 



5 Z' -> tt, m z , = 500 GeV, with 
standard model couplings to ti. 




TABLE III: Summary of Sleuth's sensitivity to several new physics models, expressed in terms of the minimum production 
cross section needed for discovery with 927 pb" 1 . Where available, a comparison is made to the sensitivity of a dedicated 
search for this model. The solid (red) box represents Sleuth's sensitivity, and the open (white) box represents the sensitivity 
of the dedicated analysis. Systematic uncertainties are not included in the sensitivity calculation. The width of each box shows 
typical variation under fluctuation of data statistics. In Models 3 and 4, there is no targeted analysis available for comparison. 
Sleuth is seen to perform comparably to the targeted analyses on models satisfying the assumptions on which Sleuth is 
based. 



to the pseudo data until Sleuth finds V < 0.001. The 
number of signal events needed to trigger discovery is 
used to calculate <7 m j n . 

For each dedicated analysis to which Sleuth is com- 
pared, the number of standard model events expected 
in 927 pb -1 within the region targeted is used to calcu- 
late the number of signal events required in that region 
to produce a discrepancy corresponding to 5a. Using 
the signal efficiency determined in the dedicated analysis, 
f min is calculated. The effect of systematic uncertainties 
are removed from the dedicated analyses, and are not 
included for Sleuth. The inclusion of systematic uncer- 
tainties will reduce the sensitivity of both Sleuth and 
the dedicated analysis to the extent that the systematic 
parameters are allowed to vary. Vista and Sleuth have 
the advantage of using a large data set to constrain them. 

The results of five such sensitivity tests are summa- 
rized in Table IITTI Sleuth is seen to perform comparably 
to targeted analyses on models satisfying the assump- 



tions on which Sleuth is based. For models in which 
Sleuth's simple use of ^2pt can be improved upon by 
optimizing for a specific feature, a targeted search may 
be expected to achieve greater sensitivity. One of the im- 
portant features of Sleuth is that it not only performs 
reasonably well, but that it does so broadly. In Model 
1, a search for a particular model point in a gauge medi- 
ated supersymmetry breaking (GMSB) scenario, Sleuth 
gains an advantage by exploiting a final state not consid- 
ered in the targeted analysis [3!|. In Model 2, a search for 
a Z' decaying to lepton pairs, the targeted analysis [40l ] 
exploits the narrow resonance in the e + e~ invariant mass. 
In Models 3 and 4, which are searches for a hadronically 
decaying Z' of different masses, there is no targeted anal- 
ysis against which to compare. In Model 5, a search for 
a Z' — > ti resonance, the signal appears at large summed 
scalar transverse momentum in a particular final state, 
resulting in comparable sensitivity between Sleuth and 
the targeted analysis |4lj . 
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FIG. 9: The distribution of V in the data, with one entry for 
each final state considered by Sleuth. 



C. Results 

The distribution of V for the final states considered 
by Sleuth in the data is shown in Fig. [5] The concav- 
ity of this distribution reflects the degree to which the 
correction model described in Sec. IIIIFl has been tuned. 
A crude correction model tends to produce a distribu- 
tion that is concave upwards, as seen in this figure, while 
an overly tuned correction model produces a distribution 
that is concave downwards, with more final states than 
expected having V near the midpoint of the unit interval. 

The most interesting final states identified by Sleuth 
are shown in Fig. IIOI together with a quantitative mea- 
sure (V) of the interest of the most interesting region in 
each final state, determined as described in Sec. IIV A 31 
The legends of Fig. [TU] show the primary contributing 
standard model processes in each of these final states, 
together with the fractional contribution of each. The 
top six final states, which correspond to entries in the 
leftmost bin in Fig. [5] span a range of populations, rel- 
evant physics objects, and important background contri- 
butions. 

The final state bb, consisting of two or three recon- 
structed jets, one or two of which are 6-tagged, heads 
the list. These events enter the analysis by satisfying 
the Vista offline selection requiring one or more jets or 
6-jets with p T > 200 GeV. The definition of Sleuth's 
^2pt variable is such that all events in this final state 
consequently have J2pt > 400 GeV. Sleuth chooses the 
region J^Pt > 469 GeV, which includes nearly 10 4 data 
events. The standard model prediction in this region 
is sensitive to the 6-tagging efficiency p(6— >6) and the 
fake rate p(J — ►&), which have few strong constraints on 
their values for jets with pt > 200 GeV other than those 
imposed by other Vista kinematic distributions within 
this and a few other related final states. For this region 
Sleuth finds V h i = 0.0055, which is unfortunately not 
statistically significant after accounting for the trials fac- 
tor associated with looking in many different final states, 



as discussed below. 

The final state j ft, consisting of events with one recon- 
structed jet and significant missing transverse momen- 
tum, is the second final state identified by Sleuth. The 
primary background is due to non-collision processes, in- 
cluding cosmic rays and beam halo backgrounds, whose 
estimation is discussed in Appendix IA 2 al Since the 
hadronic energy is not required to be deposited in time 
with the beam crossing, Sleuth's analysis of this final 
state is sensitive to particles with a lifetime between 1 ns 
and 1 /is that lodge temporarily in the hadronic calorime- 
ter, complementing Ref. [421 ]. 

The final states £ + £' + ^>jj, £ + £' + i>, and £+£' + all con- 
tain an electron (£) and muon {£') with identical recon- 
structed charge (either both positive or both negative). 
The final states with and without missing transverse mo- 
mentum are qualitatively different in terms of the stan- 
dard model processes contributing to the background es- 
timate, with the final state £ + £'~ composed mostly of di- 
jets where one jet is misreconstructed as an electron and 
a second jet is misreconstructed as a muon; Z — > t + t~, 
where one tau decays to a muon and the other to a lead- 
ing 7T°, one of the two photons from which converts while 
traveling through the silicon support structure to result 
in an electron reconstructed with the same sign as the 
muon, as described in Appendix I A 11 and Z — ► m + ^V 
in which a photon is produced, converts, and is misre- 
constructed as an electron. The final states containing 
missing transverse momentum are dominated by the pro- 
duction of W{— > (lis) in association with one or more 
jets, with one of the jets misreconstructed as an elec- 
tron. The muon is significantly more likely than the 
electron to have been produced in the hard interaction, 
since the fake rate p(j — > fi) is roughly an order of mag- 
nitude smaller than the fake rate p(j— >e), as observed 
in Tabled! The final state £ + £'~ jfijj, which contains two 
or three reconstructed jets in addition to the electron, 
muon, and missing transverse momentum, also has some 
contribution from WZ and top quark pair production. 

The final state t^> contains one reconstructed tau, sig- 
nificant missing transverse momentum, and one recon- 
structed jet with pt > 200 GeV. This final state in prin- 
ciple also contains events with one reconstructed tau, sig- 
nificant missing transverse momentum, and zero recon- 
structed jets, but such events do not satisfy the offline 
selection criteria described in Sec. IIII CI Roughly half 
of the background is non-collision, in which two differ- 
ent cosmic ray muons (presumably from the same cosmic 
ray shower) leave two distinct energy deposits in the CDF 
hadronic calorimeter, one with px > 200 GeV, and one 
with a single associated track from a pp collision occur- 
ring during the same bunch crossing. Less than a single 
event is predicted from this non-collision source (using 
techniques described in Appendix IA 2 a[) over the past 
five years of Tevatron running. 

In these CDF data, Sleuth finds V = 0.46. The frac- 
tion of hypothetical similar CDF experiments (assuming 
a fixed standard model prediction, detector simulation, 
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FIG. 10: The most interesting final states identified by Sleuth. The region chosen by Sleuth, extending up to infinity, is 
shown by the (blue) arrow just below the horizontal axis. Data are shown as filled (black) circles, and the standard model 
prediction is shown as the shaded (red) histogram. The Sleuth final state is labeled in the upper left corner of each panel, with 
t denoting e or jj,, and denoting an electron and muon with the same electric charge. The number at upper right in each 

panel shows V, the fraction of hypothetical similar experiments in which something at least as interesting as the region shown 
would be seen in this final state. The inset in each panel shows an enlargement of the region selected by Sleuth, together with 
the number of events (SM) predicted by the standard model in this region, and the number of data events (d) observed in this 
region. 
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and correction model) that would exhibit a final state 
with V smaller than the smallest V observed in the CDF 
Run II data is approximately 46%. The actual value ob- 
tained for V is not of particular interest, except to note 
that this value is significantly greater than the thresh- 
old of < 0.001 required to claim an effect of statistical 
significance. Sleuth has not revealed a discrepancy of 
sufficient statistical significance to justify a new physics 
claim. 

Systematics are incorporated into Sleuth in the form 
of the flexibility in the Vista correction model, as de- 
scribed previously. This flexibility is significantly more 
important in practice than the uncertainties on particular 
correction factor values obtained from the fit, although 
the latter are easier to discuss. The relative importance 
of correction factor value uncertainties on Sleuth's re- 
sult depends on the number of predicted standard model 
events (6) in Sleuth's high ^Zvt tail. The uncertain- 
ties on the correction factors of Table |T] are such that 
the appropriate addition in quadrature gives a typical 
uncertainty of w 10% on the total background predic- 
tion in each final state. Using er sys w 10% x b and 
Cstat ~ Vb, the relative importance of systematic un- 
certainty and statistical uncertainty is estimated to be 
fsys/ostat = 10% x bj\fb. The importance of system- 
atic and statistical uncertainties are thus comparable for 
high pt tails containing b ~ 100 predicted events. The 
effect of systematic uncertainties is provided in this ap- 
proximation rather than through a rigorous integration 
over these uncertainties as nuisance parameters due to 
the high computational cost of performing the integra- 
tion. This estimate of systematic uncertainty is valid only 
within the particular correction model resulting in the list 
of correction factors shown in Table U additional changes 
to the correction model may result in larger variation. 
The inclusion of additional systematic uncertainties does 
not qualitatively change the conclusion that Sleuth has 
not revealed a discrepancy of sufficient statistical signifi- 
cance to justify a new physics claim. 

Due to the large number of final states considered, 
there are regions (such as those shown in Fig.llOp in which 
the probability for the standard model prediction to fluc- 
tuate up to or above the number of events observed in 
the data corresponds to a significance exceeding 3cr if the 
appropriate trials factor is not accounted for. A doubling 
of data may therefore result in discovery. In particular, 
although the excesses in Fig. [TU] are currently consistent 
with simple statistical fluctuations, if any of them are 
genuinely due to new physics, Sleuth will find they pass 
the discovery threshold of V < 0.001 with roughly a dou- 
bling of data. 



V. CONCLUSIONS 

A broad search for new physics (Vista) has been per- 
formed in 927 pb _1 of CDF Run II data. A complete 
standard model background estimate has been obtained 



and compared with data in 344 populated exclusive final 
states and 16,486 relevant kinematic distributions, most 
of which have not been previously considered. Considera- 
tion of exclusive final state populations yields no statisti- 
cally significant (> 3a) discrepancy after the trials factor 
is accounted for. Quantifying the difference in shape of 
kinematic distributions using the Kolmogorov-Smirnov 
statistic, significant discrepancies are observed between 
data and standard model prediction. These discrepan- 
cies are believed to arise from mismodeling of the parton 
shower and intrinsic kx, and represent observables for 
which a QCD-based understanding is highly motivated. 
None of the shape discrepancies highlighted motivates a 
new physics claim. 

A further systematic search (Sleuth) for regions of 
excess on the high-^ pt tails of exclusive final states has 
been performed, representing a quasi-model-independent 
search for new electroweak scale physics. Most of the ex- 
clusive final states searched with Sleuth have not been 
considered by previous Tevatron analyses. A measure of 
interest rigorously accounting for the trials factor asso- 
ciated with looking in many regions with few events is 
defined, and used to quantify the most interesting region 
observed in the CDF Run II data. No region of excess on 
the high-J^ pt tail of any of the Sleuth exclusive final 
states surpasses the discovery threshold. 

Although this global analysis of course cannot prove 
that no new physics is hiding in these data, this broad 
search of the Tevatron Run II data represents one of the 
single most encompassing tests of the particle physics 
standard model at the energy frontier. 
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TABLE IV: Central single particle misidentification matrix. 
Using a single particle gun, 10 5 particles of each type shown 
at the left of the table are shot with pr = 25 GeV into the cen- 
tral CDF detector, uniformly distributed in 8 and in <f>. The 
resulting reconstructed object types are shown at the top of 
the table, labeling the table columns. Thus the rightmost ele- 
ment of this matrix in the fourth row from the bottom shows 
p(r~ — >&), the number of negatively charged tau leptons (out 
of 10 5 ) reconstructed as a 6-tagged jet. 
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APPENDIX A: VISTA CORRECTION MODEL 
DETAILS 

This appendix contains details of the Vista correc- 
tion model. Appendix IA II covers the physical mecha- 
nisms underlying fake rates. Appendix IA 21 contains in- 
formation about additional background sources, includ- 
ing backgrounds from cosmic rays and beam halo, mul- 
tiple interactions, and the effects of intrinsic kx- Ap- 
pendix IA 31 contains details of the Vista correction fac- 



tor fit, including the construction of the x 2 function that 
is minimized and the resulting covariance matrix. Ap- 
pendix IA 41 discusses the values of the correction factors 
that are obtained. 



1. Fake rate physics 

The following facts begin to build a unified understand- 
ing of fake rates for electrons, muons, taus, and photons. 
This understanding is woven throughout the Vista cor- 
rection model, and significantly informs and constrains 
the Vista correction process. Explicit constraints de- 
rived from these studies are provided in Appendix IA 31 
The underlying physical mechanisms for these fakes lead 
to simple and well justified relations among them. 

Table ITVl shows the response of the CDF detector sim- 
ulation, reconstruction, and object identification algo- 
rithms to single particles. Using a single particle gun, 10 s 
particles of each type shown at the left of the table are 
shot with px — 25 GeV into the CDF detector, uniformly 
distributed in 9 and in 4>. The resulting reconstructed ob- 
ject types are shown at the top of the table, labeling the 
columns. The first four entries on the diagonal at upper 
left show the efficiency for reconstructing electrons and 
muons [56]. The fraction of electrons misidentified as 
photons, shown in the top row, seventh column, is seen 
to be roughly equal to the fraction of photons identified 
as electrons or positrons, shown in the fifth row, first and 
second columns, and measures the number of radiation 
lengths in the innermost regions of the CDF tracker. The 
fraction of B mesons identified as electrons or muons, pri- 
marily through semileptonic decay, are shown in the four 
left columns, eleventh through fourteenth rows. Other 
entries provide similarly useful information, most easily 
comprehensible from simple physics. 

The transverse momenta of the objects reconstructed 
from single particles are displayed in Fig. 111! The rel- 
ative resolutions for the measurement of electron and 
muon momenta are shown in the first four histograms 
on the diagonal at upper left. The histograms in the 
left column, sixth through eighth rows, show that sin- 
gle neutral pions misreconstructed as electrons have their 
momenta well measured, while single charged pions mis- 
reconstructed as electrons have their momenta system- 
atically undermeasured, as discussed below. The his- 
togram in the top row, second column from the right, 
shows that electrons misreconstructed as jets have their 
energies systematically overmeasured. Other histograms 
in Fig. [TT] contain similarly relevant information, easily 
overlooked without the benefit of this study, but under- 
standable from basic physics considerations once the ef- 
fect has been brought to attention. 

Here and below p{q — * X) denotes a quark fragmenting 
to X carrying nearly all of the parent quark's energy, and 
p(j—*X) denotes a parent quark or gluon being misre- 
constructed in the detector as X. 

The probability for a light quark jet to be misrecon- 
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FIG. 11: Transverse momentum distribution of reconstructed 
objects (labeling columns) arising from single particles (la- 
beling rows) with pr = 25 GeV shot from a single particle 
gun into the central CDF detector. The area under each his- 
togram is equal to the number of events in the corresponding 
misidentification matrix element of Table IIV1 with the verti- 
cal axis of each histogram scaled to the peak of each distribu- 
tion. A different vertical scale is used for each histogram, and 
histograms with fewer than ten events are not shown. The 
horizontal axis ranges from to 50 GeV. 



p(q^ir°) p(ir° — >. 
p(q^ir + ) p(tt + — > 
p(q->K+)p(K+- 



")■ 



(Al) 



A similar equation holds for a light quark jet faking an 
e~ . 

The probability for a light quark jet to be misrecon- 
structed as a /i + can be written 



p(g- 



■Tt + )p(Tt + —>■ + 

*K + )p(K + ^p,+). 



(A2) 



Hercp(7r^/i) denotes pion decay-in-flight, and p(K — 
denotes kaon decay-in-flight; other processes contribute 
negligibly. A similar equation holds for a light quark jet 
faking a pT . 

The only non-negligible underlying physical mecha- 
nisms for a jet to fake a photon are for the parent quark 
or gluon to fragment into a photon or a neutral pion, car- 
rying nearly all the energy of the parent quark or gluon. 
Thus 



p(q^n°)p(n - 
p(g-»7)p(7- 



>7 ) 
> 7 ). 



(A3) 



Up and down quarks and gluons fragment nearly 
equally to each species of pion; hence 



-p(?->7r) =p(q 



7r + ) = p(q—>TT 
p(q^n°), 



(A4) 



where p(q—>ir) denotes fragmentation into any pion car- 
rying nearly all of the parent quark's energy. Fragmenta- 
tion into each type of kaon also occurs with equal prob- 
ability; hence 

\p{q^K) =p(q^K+)=p(q^K-) 

= p(q^K°)=p(q^K°), (A5) 

where p(q—>K) denotes fragmentation into any kaon car- 
rying nearly all of the parent quark's energy. 

Pythia contains a parameter that sets the number 
of string fragmentation kaons relative to the number of 
fragmentation pions. The default value of this parame- 
ter, which has been tuned to LEP I data, is 0.3; for every 
1 up quark and every 1 down quark, 0.3 strange quarks 
are produced. Strange particles are produced perturba- 
tively in the hard interaction itself, and in perturbative 
radiation, at a ratio larger than 0.3:1:1. This leads to the 
inequality 



0-3<fc^<l, 
where p(q—>K) and p{q—>i{) are as defined above. 
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FIG. 12: A few of the most discrepant distributions in the final states ej and jp, which are greatly affected by the fake 
rates p(j—*e) and p(j— respectively. These distributions are among the 13 significantly discrepant distributions identified 
as resulting from coarseness of the correction model employed. The vertical axis shows the number of events; the horizontal 
axes show the transverse momentum and pseudorapidity of the lepton. Filled (black) circles show CDF data, and the shaded 
(red) histogram shows the standard model prediction. Events enter the ej final state either on a central electron trigger with 
Pt > 25 GeV, or on a plug electron trigger with pr > 40 GeV. The fake rate p(j— »e) is significantly larger in the plug region 
than in the central region of the CDF detector. Muons are identified with separate detectors covering the regions \rj\ < 0.6 and 
0.6 < \r)\ < 1.0. 



The probability for a jet to be misreconstructed as a 
tau lepton can be written 

P(j^T+)= P (j -> T+) + P (j -> T+ ) , ( A7) 

where p(j— ►t^") denotes the probability for a jet to fake 
a 1-prong tau, and p(j^T^) denotes the probability for 
a jet to fake a 3-prong tau. For 1-prong taus, 

P(j^T+) = p(q^TT+)p(TT + ->T+) + 

p(q->K + )p(K + ^T+). (A8) 

Similar equations hold for negatively charged taus. 

Figure [T4l shows the probability for a quark (or gluon) 
to fake a one-prong tau, as a function of transverse mo- 
mentum. Using fragmentation functions tuned on LEP 1 



data, Pythia predicts the probability for a quark jet to 
fake a one-prong tau to be roughly four times the proba- 
bility for a gluon jet to fake a one-prong tau. This differ- 
ence in fragmentation is incorporated into Vista's treat- 
ment of jets faking electrons, muons, taus, and photons. 
The Vista correction model includes such correction fac- 
tors as the probability for a jet with a parent quark to 
fake an electron (0033 and 0034) and the probability for 
a jet with a parent quark to fake a muon (0035); the 
probability for a jet with a parent gluon to fake an elec- 
tron or muon is then obtained by dividing the values of 
these fitted correction factors by four. 

The physical mechanism underlying the process 
whereby an incident photon or neutral pion is misrecon- 
structed as an electron is a conversion in the material 
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FIG. 13: A few of the most discrepant distributions in the final states jr and jy, which are greatly affected by the fake rates 
p(j—*T) and p(j— >y), respectively. The vertical axis shows the number of events; the horizontal axes show the transverse 
momentum and pseudorapidity of the tau lepton and photon. Filled (black) circles show CDF data, and the shaded (red) 
histogram shows the standard model prediction. The distributions in the jy final state are among the 13 significantly discrepant 
distributions identified as resulting from coarseness of the correction model employed. 



serving as the support structure of the silicon vertex de- 
tector. This process produces exactly as many e + as e~, 
leading to 

ip(7^e) =p(7->e + ) = p{l~>e~) 
ip(^^ e )=p(7r ->e + )=p(7r°^ e -), (A9) 

where e is an electron or positron. 

From Fig. 111! the average pr of electrons reconstructed 
from 25 GeV incident photons is 23.9 ± 1.4 GeV. The av- 
erage pt of electrons reconstructed from incident 25 GeV 
neutral pions is 23.7 ± 1.3 GeV. 

The charge asymmetry between p{K + — ->e + ) and 
p(K~ — > e~) in Table ITVl arises because K~ can capture 
on a nucleon, producing a single hyperon. Conservation 
of baryon number and strangeness prevents K + from cap- 
turing on a nucleon, reducing the K + cross section rela- 



tive to the K~ cross section by roughly a factor of two. 
The physical process primarily responsible for tt^ — > 
is inelastic charge exchange 

ir~p — > TT°n 

■w+n -> TT°p (A10) 

occurring within the electromagnetic calorimeter. The 
charged pion leaves the "electron's" track in the CDF 
tracking chamber, and the ir° produces the "electron's" 
electromagnetic shower. No true electron appears at all 
in this process, except as secondaries in the electromag- 
netic shower originating from the ir°. 

The average px of reconstructed "electrons" originat- 
ing from a single charged pion is 18.8 ± 2.2 GeV, indi- 
cating that the misreconstructed "electron" in this case 
is measured to have on average only 75% of the total en- 
ergy of the parent quark or gluon. This is expected, since 
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FIG. 14: The probability for a generated parton to be mis- 
reconstructed as a one-prong r, as a function of the parton's 
generated pr- Filled (red) circles show the probability for a 
jet arising from a parent quark to be misreconstructed as a 
one-prong tau. Filled (blue) triangles show the probability 
for a jet arising from a parent gluon to be misreconstructed 
as a one-prong tau. 



the recoiling nucleon from the charge exchange process 
carries some of the incident pion's momentum. 

An additional small loss in energy for a jet misrecon- 
structed as an electron, photon, or muon is expected since 
the leading ir + , K + , 7r°, or 7 takes only some fraction of 
the parent quark's energy. 

The cross sections for ir~p — > ir°n and ir + n 
proceeding through the isospin / conserving and ijj inde- 
pendent strong interaction, are roughly equal. The cor- 
responding particles in the two reactions are related by 
interchanging the signs of their z-components of isospin. 



*°P, 



The probability for a 25 GeV n + to decay to a /i + can 



be written 



p(7T+^+) 



^(decays within tracker) + 
p(decays within calorimeter). 



(AH) 



The probability for the pion to decay within the tracking 
volume is 



^(decays within tracker) = 1 — e 



(A12) 



where 7 = 25 GeV / 140 MeV = 180 is the pion's 
Lorentz boost, the proper decay length of the charged 
pion is (ct) = 7.8 meters, and the radius of the 
CDF tracking volume is i?trackor = 1-5 meters, giving 
p(decays within tracker) = 0.001. The probability for 
the pion to decay within the calorimeter volume is 

^(decays within calorimeter) ps Xj/^(ct), (A13) 

where A/ ~ 0.4 meters is the nuclear interaction length 
for charged pions on lead or iron and the path length 



through the calorimeter is L ca \ 2 meters, leading to 
p(decays within calorimeter) 0.00025. Summing the 
contributions from decay within the tracking volume 
and decay within the calorimeter volume, p(-7r + — >/i + ) w 
0.00125. 

The primary physical mechanism by which a jet fakes a 
photon is for the parent quark or gluon to fragment into a 
leading ir° carrying nearly all the momentum. The highly 
boosted 7T° decays within the beam pipe to two photons 
that are sufficiently collinear to appear in the preshower, 
electromagnetic calorimeter, and shower maximum de- 
tector as a single photon. Thus 



(A14) 



An immediate corollary is that the misreconstructed 
"photon" carries the energy of the parent quark or gluon, 
and is well measured. 

Typical jets are measured with poorer energy resolu- 
tion than jets that have faked electrons, muons, or pho- 
tons. 

Since p(q—>7r°) ^> p(q — 5-7), it follows from Eq. I A4I and 
Table Hvl that the conversion contribution to p{j— >e) is 
~ 75%, and the charge exchange contribution is « 25%: 

0.75 , . , 

n)p\i^ e )+ 



0.25 



( p(q- 

p{q- 
( p(g-> 

p(q-> 



->7r°)p(7r°- 
■k + )p(tt + — » 
K+)p{K+- 



)/ 



+ 



)• 



(A15) 



The number of e + j events in data is 0.9 times the 
number of e~ j events. This charge asymmetry arises 
from p(K + —*e + ) ax\& p(K~ e~) in Tabic H*Vl Quanti- 
tatively, 



p(j^e+) _ 0.9 + 0.2p(K+^e + )/p(K~^e) 



P(j- 



0.9 + 0.2p{K- 



)/p(K- 



(A16) 



where 0.9 is the sum of 0.75 from Eq. IA15I and 
0.15 w 0.25 x 0.6 from Eq. \M\ and 0.2 is twice 



1 - 0.9. 
in Table 
and p(K~ 



From p(K + - 



') and p(K -^e ) 
p{K + ~>e + )/p{K->e) = 1/3 
>e~)/p(K^e) = 2/3, predicting 
p(j~> e+ )/p{j~> e ~) = 0.935, in reasonable agree- 
ment with the ratio of the observed number of events in 
the e + j and e~ j final states. 

The number of j // + events observed in CDF Run 
II is 1.1 times the number of j [i~ events observed. 
This charge asymmetry arises from p(K + — > fi + ) and 
p(K--^fi-) in Table EI 

The physical mechanism by which a prompt photon 
fakes a tau lepton is for the photon to convert, producing 
an electron or positron carrying most of the photon's 
energy, which is then misreconstructed as a tau. The 
probability for this to occur is equal for positively and 
negatively charged taus, 



1 



P{l- 



P{l- 



1, 



(A17) 
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and is related to previously denned quantities by 

p(7^r) =p(7^e) — - — -p(e->r), (A18) 

where p("f — >e) denotes the fraction of produced photons 
that are reconstructed as electrons, p(e—>e) denotes the 
fraction of produced electrons that are reconstructed as 
electrons, and hence ^(7— >e)/p(e— >e) is the fraction of 
produced photons that pair produce a single leading elec- 
tron. 

Note p(e^j) ~ p(7~ >e) from Table \W\ as expected, 
with value of w 0.03 determined by the amount of mate- 
rial in the inner detectors and the tightness of isolation 
criteria. A hard bremsstrahlung followed by a conver- 
sion is responsible for electrons to be reconstructed with 
opposite sign; hence 

p(e ± ^e T )= p(e + — >e~) = p(e~ -^e + ) 

« Ip(e±^7)p( 7 ^ e T), (A19) 

where the factor of 1/2 comes because the material al- 
ready traversed by the e will not be traversed again by 
the 7. In particular, track curvature mismeasurement is 
not responsible for erroneous sign determination in the 
central region of the CDF detector. 

From knowledge of the underlying physical mecha- 
nisms by which jets fake electrons, muons, taus, and 
photons, the simple use of a reconstructed jet as a lep- 
ton or photon with an appropriate fake rate applied to 
the weight of the event needs slight modification to cor- 
rectly handle the fact that a jet that has faked a lep- 
ton or photon generally is measured more accurately 
than a hadronic jet. Rather than using the momentum 
of the reconstructed jet, the momentum of the parent 
quark or gluon is determined by adding up all Monte 
Carlo particle level objects within a cone of Ai? = 0.4 
about the reconstructed jet. In misreconstructing a jet 
in an event, the momentum of the corresponding par- 
ent quark or gluon is used rather than the momentum 
of the reconstructed jet. A jet that fakes a photon 
then has momentum equal to the momentum of the par- 
ent quark or gluon plus a fractional correction equal to 
0.01 x (parent pt — 25 GeV) / (25 GeV) to account for leak- 
age out of th e cone of Ai? = 0.4, and a further smearing 
of 0.2 V GeV x -^/parentpT, reflecting the electromagnetic 
resolution of the CDF detector. The momenta of jets that 
fake photons are multiplied by an overall factor of 1.12, 
and jets that fake electrons, muons, or taus are multi- 
plied by an overall factor of 0.95. These numbers are 
determined by the £j, and 7J final states. The distri- 
butions most sensitive to these numbers are the missing 
energy and the jet pr- 

A b quark fragmenting into a leading b hadron that 
then decays leptonically or semileptonically results in an 
electron or muon that shares the pr of the parent b quark 
with the associated neutrino. If all hadronic decay prod- 
ucts are soft, the distribution of the momentum fraction 
carried by the charged lepton can be obtained by con- 
sidering the decay of a scalar to two massless fermions. 



Isolated and energetic electrons and muons arising from 
parent b quarks in this way are modeled as having px 
equal to the parent b quark pr, multiplied by a random 
number uniformly distributed between and 1. 

2. Additional background sources 

This appendix provides additional details on the esti- 
mation of the standard model prediction. 



a. Cosmic ray and beam halo muons 

There are four dominant categories of events caused by 
cosmic ray muons penetrating the detector: fj,$, fi + pT , 
7/5, and jf. There is negligible contribution from cosmic 
ray secondaries of any particle type other than muons. 

A cosmic ray muon penetrating the CDF detector 
whose trajectory passes within 1 mm of the beam line 
and within — 60 < z < 60 cm of the origin may be recon- 
structed as two outgoing muons. In this case the cosmic 
ray event is partitioned into the final state If 
one of the tracks is missed, the cosmic ray event is parti- 
tioned into the final state \ij>. The standard CDF cosmic 
ray filter, which makes use of drift time information in 
the central tracking chamber, is used to reduce these two 
categories of cosmic ray events. 

CDF data events with exactly one track (correspond- 
ing to one muon) and events with exactly two tracks (cor- 
responding to two muons) are used to estimate the cosmic 
ray muon contribution to the final states and 
after the cosmic ray filter. This sample of events is used 
as the standard model background process cosmic /1. 
The cosmic /1 sample does not contribute to the events 
passing the analysis offline trigger, whose cleanup cuts 
require the presence of three or more tracks. Roughly 
100 events are expected from cosmic ray muons in the 
categories fj, + jj> and The sample cosmic [i is ne- 

glected from the background estimate, since there is no 
discrepancy that demands its inclusion. 

The remaining two categories are 7^ and result- 
ing from a cosmic ray muon that penetrates the CDF 
electromagnetic or hadronic calorimeter and undergoes 
a hard bremsstrahlung in one calorimeter cell. Such an 
interaction can mimic a single photon or a single jet, re- 
spectively. The reconstruction algorithm infers the pres- 
ence of significant missing energy balancing the "photon" 
or "jet." If this cosmic ray interaction occurs during a 
bunch crossing in which there is a pp interaction produc- 
ing three or more tracks, the event will be partitioned 
into the final state 7^ or ]j>. 

CDF data events with fewer than three tracks are 
used to estimate the cosmic ray muon contribution to 
the final states yfi and jj>. These samples of events are 
used as standard model background processes cosmic 7 
and cosmic j for the modeling of this background, cor- 
responding to offline triggers requiring a photon with 
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FIG. 15: The distribution of transverse momentum and azimuthal angle for photons and jets in the 7^ and jj> final states, 
dominated by cosmic ray and beam halo muons. The vertical axis shows the number of events in each bin. Data are shown 
as filled (black) circles; the standard model prediction is shown as the shaded (red) histogram. Here the "standard model" 
prediction includes contributions from cosmic ray and beam halo muons, estimated using events containing fewer than three 
reconstructed tracks. The contribution from cosmic ray muons is flat in <j>, while the contribution from beam halo is localized 
to <j> — 0. The only degrees of freedom for the background to these final states are the cosmic 7 and cosmic j correction factors, 
whose values are determined from the global VlSTA fit and provided in Table U 



Pt > 60 GeV, or a jet with pr > 40 GeV (prescaled) or 
Pt > 200 GeV (unprescaled), respectively. These sam- 
ples do not contribute to the events passing the anal- 
ysis offline trigger, whose cleanup cuts require three or 
more tracks. The contribution of these events is adjusted 
with correction factors that are listed as cosmic 7 and 
cosmic j "fc-factors" in TableUj but which are more prop- 
erly understood as reflecting the number of bunch cross- 
ings with zero pp interactions (resulting in zero recon- 
structed tracks) relative to the number of bunch cross- 
ings with one or more interactions (resulting in three or 
more reconstructed tracks). Since the number of bunch 
crossings with no inelastic pp interactions is used to de- 
termine the CDF instantaneous luminosity, these cosmic 
correction factors can be viewed as containing direct in- 
formation about the luminosity-averaged instantaneous 



luminosity. 

The cosmic ray muon contribution to the final states 
7/5 and j j> is uniform as a function of the CDF azimuthal 
angle cf>. Consider the CDF detector to be a thick cylin- 
drical shell, and consider two arbitrary infinitesimal vol- 
ume elements at different locations in the material of 
the shell. Since the two volume elements have simi- 
lar overburdens, the number of cosmic ray muons with 
E > 20 GeV penetrating the first volume element is very 
nearly the same as the number of cosmic ray muons with 
E > 20 GeV penetrating the second volume element. 
Since the material of the CDF calorimeters is uniform 
as a function of CDF azimuthal angle <f>, it follows that 
the cosmic ray muon contribution to the final states 7$ 
and ]j> should also be uniform as a function of <f>. In 
particular, it is noted that the <fi dependence of this con- 
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tribution depends solely on the material distribution of 
CDF calorimeter, which is uniform in tf>, and has no 
dependence on the distribution of the horizon angle of 
the muons from cosmic rays streaking through the atmo- 
sphere. 

The final states 7 j> and j j> are also populated by beam 
halo muons, traveling horizontally through the CDF de- 
tector in time with a bunch. A beam halo muon can 
undergo a hard bremsstrahlung in the electromagnetic 
or hadronic calorimeters, producing an energy deposit 
that can be reconstructed as a photon or jet, respectively. 
These beam halo muons tend to lie in the plane and out- 
side of the Tevatron ring, thus horizontally penetrating 
the CDF detector along z at j/ = 0, 1 > 0, and hence 
= 0. 

Figure [TBI shows the 7^ and jj) final states, in which 
events come primarily from cosmic ray and beam halo 
muons. 



b. Multiple interactions 

In order to estimate event overlaps, consider an inter- 
esting event observed in final state C, which looks like an 
overlap of two events in the final states A and B. An ex- 
ample is C=e+e-4j , A=e+e- and B=4j . It is desired to 
estimate how many C events are expected from the over- 
lap of A and B events, given the observed frequencies of 
A and B. 

Let C(t) be the instantaneous luminosity as a function 
of time i; let 

L = j £(t)dt = 927 pb" 1 (A20) 

jRunll 



denote the total integrated luminosity; and let 

- = J ' C(t)C(t)dt w iq32 2 j (A21) 



be the luminosity-averaged instantaneous luminosity. 
Denote by to the time interval of 396 ns between suc- 
cessive bunch crossings. The total number of effective 
bunch crossings X is then 



X = w 2.3 x 10 13 . 
Ct 



(A22) 



Letting A and B denote the number of observed events in 
final states A and B, it follows that the number of events 
in the final state C expected from overlap of A and B is 



C = 



AB 
~X~' 



(A23) 



Overlap events are included in the VlSTA background 
estimate, although their contribution is generally negli- 
gible. 



c. Intrinsic kr 

Significant discrepancy is observed in many final states 
containing two objects ol and o2 in the variables 
A0(ol,o2), unci pt, and j> T . These discrepancies are 
ascribed to the sum of two effects: (1) an intrinsic Fermi 
motion of the colliding partons within the proton and 
anti-proton, and (2) soft radiation along the beam axis. 
The sum of these two effects appears to be larger in Na- 
ture than predicted by Pythia with the parameter tunes 
used for the generation of the samples employed in this 
analysis. This discrepancy is well known from previous 
studies at the Tevatron and elsewhere, and affects this 
analysis similarly to other Tevatron analyses. 

The W and Z electroweak samples used in this analysis 
have been generated with an adjusted Pythia parameter 
that increases the intrinsic kx- For all other generated 
standard model events, the net effect of the Fermi motion 
of the colliding partons and the soft non-perturbative ra- 
diation is hypothesized to be described by an overall "ef- 
fective intrinsic fcy," and the center of mass of each event 
is given a transverse kick. Specifically, for every event of 
invariant mass m and generated summed transverse mo- 
mentum a random number kx is pulled from the 
probability distribution 

p(k T ) oc (k T < m/5)x [^g(k T ;fi = 0, a x ) + 

i ff (fc T ;/x = 0,a 2 )], (A24) 

where {kx < rn/5) evaluates to unity if true and zero if 
false; g(kxi fJ>,cr) is a Gaussian function of kx with cen- 
ter at fj, and width a; a x = 2.55 GeV + 0.0085 X>t 
is the width of the core of the double Gaussian; and 
0-2 = 5.25 GeV + 0.0175 £> T is the width of the sec- 
ond, wider Gaussian. The event is then boosted to an 



kx/m with re- 



inertial frame traveling with speed 

spect to the lab frame, in a direction transverse to the 
beam axis, where m is the invariant mass of all recon- 
structed objects in the event, along an azimuthal angle 
pulled randomly from a uniform distribution between 
and 2n. The momenta of identified objects are recalcu- 
lated in the lab frame. Sixty percent of the recoil kick 
is assigned to unclustered momentum in the event. The 
remaining forty percent of the recoil kick is assumed to 
disappear down the beam pipe, and contributes to the 
missing transverse momentum in the event. This picture, 
and the particular parameter values that accompany this 
story, are determined primarily by the unci px and j> T 
distributions in highly populated two-object final states, 
including the low-pr 2j final state, the high-p^ 2j final 
state, and the final states j'7, e + e~, and 

Under the hypothesis described, reasonable although 
imperfect agreement with observation is obtained. The 
result of this analysis supports the conclusions of previous 
studies indicating that the effective intrinsic kx needed to 
match observation is quite large relative to naive expec- 
tation. That the data appear to require such a large ef- 
fective intrinsic kx may be pointing out the need for some 
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basic improvement to our understanding of this physics. 

3. Global fit 

This section describes the construction of the global 
X 2 used in the Vista global fit. 

°- xl 

The bins in the CDF high-p^ data sample are labeled 
by the index k = Qci,k2), where each value of k\ rep- 
resents a phrase such as "this bin contains events with 
three objects: one with 17 < pr < 25 GeV and |ry| < 0.6, 
one with 40 < p T < 60 GeV and 0.6 < \rj\ < 1.0, and 
one with 25 < p T < 40 GeV and 1.0 < \r]\" and each 
value of fc 2 represents a phrase such as "this bin contains 
events with three objects: an electron, muon, and jet, 
respectively." The reason for splitting fc into fci and ki 
is that a jet can fake an electron (mixing the contents of 
fc 2 ), but an object with \r)\ < 0.6 cannot fake an object 
with 0.6 < < 1.0 (no mixing of fci). The term corre- 
sponding to the fc th bin takes the form of Eq. [TJ where 
Dataffc] is the number of data events observed in the k th 
bin, SM[fc] is the number of events predicted by the stan- 
dard model in the fc th bin, <5SM[fc] is the Monte Carlo 
statistical uncertainty on the standard model prediction 
in the fc th bin, and y'SMffc] is the statistical uncertainty 
on the prediction in the fc th bin. To legitimize the use 
of Gaussian errors, only bins containing eight or more 
data events are considered. The standard model predic- 
tion SMffc] for the fc th bin can be written in terms of the 
introduced correction factors as 

SM[fc] = SM[(fci,fe 2 )] = 

processes 

(/ Cdt) • (kFactor[Z]) • (SM [(fc 1; k 2 ')][l}) ■ 
(probability ToBeSoMisreconstructed[(fci, fc2')][^2]) • 

(probabilityPassesTrigger[(fci, £2)]), (A25) 

where SMffc] is the standard model prediction for the 
k bin; the index k is the Cartesian product of 
the two indices ki and fc 2 introduced above, label- 
ing the regions of the detector in which there are 
energy clusters and the identified objects correspond- 
ing to those clusters, respectively; the index k 2 is 
a dummy summation index; the index I labels stan- 
dard model background processes, such as dijet pro- 
duction or W+l jet production; SM [(fci, fc 2 ')]M is the 
initial number of standard model events predicted in 
bin (ki,k 2 ) from the process labeled by the index 
I; probability ToBeMisreconstructedThus[(fci, k 2 ')] [k 2 ] is 
the probability that an event produced with en- 
ergy clusters in the detector regions labeled by fci 
that are identified as objects labeled by k 2 would 
be mistaken as having objects labeled by k 2 \ and 



probabilityPassesTrigger[(fci, k 2 )} represents the proba- 
bility that an event produced with energy clusters in the 
detector regions labeled by fci that are identified as ob- 
jects labeled by k 2 would pass the trigger. 

The quantity SMo[(fci, fc 2 ')]M is obtained by generating 
some number ni (say 10 4 ) of Monte Carlo events corre- 
sponding to the process I. The event generator provides 
a cross section 07 for this process I. The weight of each of 
these Monte Carlo events is equal to ai/ni. Passing these 
events through the CDF simulation and reconstruction, 
the sum of the weights of these events falling into the bin 
(h,k 2 ') is SMopi,fc 2 ')]H. 



X constraints 

The term Xconstraints(^) m Eq. [2] reflects constraints on 
the values of the correction factors determined by data 
other than those in the global high-pT sample. These 
constraints include fc-factors taken from theoretical cal- 
culations and numbers from the CDF literature when use 
is made of CDF data external to the Vista high-py sam- 
ple. The constraints imposed are: 

• The luminosity (0001) is constrained to be within 
6% of the value measured by the CDF Cerenkov 
luminosity counters. 

• The fake rate p(q— ^7) (0039) is constrained to be 
2.6 x 10~ 4 ± 1.5 x 10~ 5 , from the single particle gun 
study of Appendix IA II 

• The fake rate p(e— 5-7) (0032) plus the efficiency 
p(e^e) (0026) for electrons in the plug is con- 
strained to be within 1% of unity. 

• Noting p{q—>^) corresponds to correction factor 
0039, p{q^ir ± ) = 2p(g->7r°), and p(q^n°) = 
p(q^>j)/p(Tr°— »7), and taking p(7r°^7) = 0.6 
and p(7r ± ^r) = 0.415 from the single parti- 
cle gun study of Appendix IA 11 the fake rate 
p(q-^r) (0038) is constrained to p{q^>r) — 
plq^TT^p^^r) ±10%. 

• The fc-factors for dijet production (0018 and 0019) 
are constrained to 1.10 ± 0.05 and 1.33 ± 0.05 in 
the kinematic regions pr < 150 GeV and pr > 
150 GeV, respectively, where pr is the transverse 
momentum of the scattered partons in the 2 — > 2 
process in the colliding parton center of momentum 
frame. 

• The inclusive /c-factor for 7 + TV jets (0004-0007) is 
constrained to 1.25 ± 0.15 jSlUl. 

• The inclusive fc-factor for 77 + Ajets (0008-0010) 
is constrained to 2.0 ± 0.15 [451 ]. 

• The inclusive fc-factors for W and Z production 
(0011-0014 and 0015-0017) are subject to a 2- 
dimensional Gaussian constraint, with mean at the 
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NNLO/LO theoretical values [46|], and a covari- 
ance matrix that encapsulates the highly corre- 
lated theoretical uncertainties, as discussed in Ap- 
pendix IA 41 



• Trigger efficiency correction factors are constrained 
to be less than unity. 

• All correction factors are constrained to be positive. 



c. Covariance matrix 

This section describes the correction factor covariance 
matrix S. The inverse of the covariance matrix is ob- 
tained from 



ld 2 X 2 (s) 



2 dsidsj 



(A26) 



where x 2 (s) is defined by Eq.[2]as a function of the correc- 
tion factor vector s, vector elements Sj and Sj are the i th 
and j th correction factors, and sq is the vector of correc- 
tion factors that minimizes X 2 {s)- Numerical estimation 
of the right hand side of Eq. IA26I is achieved by calcu- 
lating x 2 at so and at positions slightly displaced from 
sq in the direction of the i th and j th correction factors, 



denoted by the unit vectors i and j. 
second partial derivative 



Approximating the 



model prediction to data. The correction factors consid- 
ered are numbers that can in principle be calculated a 
priori, but whose calculation is in practice not feasible. 
These correction factors divide naturally into two classes, 
the first of which reflects the difficulty of calculating the 
standard model prediction to all orders, and the second 
of which reflects the difficulty of understanding from first 
principles the response of the experimental apparatus. 

The theoretical correction factors considered are of two 
types. The difficulty of calculating the standard model 
prediction for many processes to all orders in perturba- 
tion theory is handled through the introduction of k- 
factors, representing the ratio of the true all orders pre- 
diction to the prediction at lowest order in perturbation 
theory. Uncertainties in the distribution of partons in- 
side the colliding proton and anti-proton as a function of 
parton momentum are in principle handled through the 
introduction of correction factors associated with parton 
distribution functions, but there are currently no discrep- 
ancies to motivate this. 

Experimental correction factors correspond to num- 
bers describing the response of the CDF detector that are 
precisely calculable in principle, but that are in practice 
best constrained by the high-pr data themselves. These 
correction factors take the form of the integrated lumi- 
nosity, object identification efficiencies, object misiden- 
tification probabilities, trigger efficiencies, and energy 
scales. 



d 2 X 2 



dsjdsi 



leads to 



X 2 (s Q + iSs, + jSsj) - x 2 (s +jSs 3 ) 
SsjSsi 

X 2 (s +iSsi) - x 2 (sq) 
SsjSsi 



s-. 1 



[x 2 (so + 5 Si i + Ssjj) 
-~X 2 (sq + Ssii) 
-X 2 (so+Ss :j j) 
+X 2 (s )}/(28 Sl 8 S] ), 



(A27) 



for appropriately small steps S-Si and Ssj away from the 
minimum. The covariance matrix £ is calculated by in- 
verting £ _1 . The diagonal element is the variance 
of of the i th correction factor, and the correlation pij be- 
tween the i th and j th correction factors is = Y,ij/<JiO-j. 
The variances of each correction factor, corresponding to 
the diagonal elements of the covariance matrix, are shown 
in Table |TJ The correlation matrix obtained is shown in 
Table El 



4. Correction factor values 

This section provides notes on the values of the Vista 
correction factors obtained from a global fit of standard 



a. k-factors 

For nearly all standard model processes, fc-factors are 
used as an overall multiplicative constant, rather than be- 
ing considered to be a function of one or more kinematic 
variables. The spirit of the approach is to introduce as 
few correction factors as possible, and to only introduce 
correction factors motivated by specific discrepancies. 

0001 . The integrated luminosity of the analysis sam- 
ple has a close relationship with the theoretically deter- 
mined values of inclusive W and Z production at the 
Tcvatron. Figure \W\ shows the variation in calculated in- 
clusive W and Z fc-factors under changes in the assumed 
parton distribution functions. Each point represents a 
different W and Z inclusive cross section determined us- 
ing modified parton distribution functions. The use of 
16 bases to reflect systematic uncertainties results in 32 
black dots in Fig. [T|5J The uncertainties in the W and 
Z cross sections due to variations in the renormalization 
and factorization scales are nearly 100% correlated; vary- 
ing these scales affects both the W and Z inclusive cross 
sections in the same way. The uncertainties in the parton 
distribution functions and the choice of renormalization 
and factorization scales represent the dominant contribu- 
tions to the theoretical uncertainty in the total inclusive 
W and Z cross section calculations at the Tevatron. The 
term in ^constraints that reflects our knowledge of the the- 
oretical prediction of the inclusive W and Z cross sections 
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0004 -.56 +.37 +.39 1 +.9 +.77 +.48 +.61 +.33 +.23 +.49 +.43 +.29 +.16 +.46 +.32 +.18 +.5 +.53 +.53 +.52 +.49 +.49 +.35 +.3 +.1 +.25 +.2 -.46 +.03 +.13 -.44 -.09 -.03 -.01 +.02 -.32 -.62 -.17 +.11 +.11 +.07 +.06 

0005 -.53 +.38 +.37 +.9 1 +.75 +.46 +.62 +.31 +.21 +.46 +.41 +.27 +.15 +.44 +.3 +.17 +.5 +.51 +.48 +.5 +.47 +.46 +.33 +.29 +.1 +.24 +.19 -.49 +.03 -.02 +.12 -.43 -.09 -.04 +.02 -.29 -.57 -.16 +.11 +.1 +.07 +.06 

0006 -.45 +.33 +.31 +.77 +.75 1 +.4 +.54 +.29 +.13 +.4 +.35 +.24 +.13 +.38 +.26 +.14 +.43 +.44 +.42 +.42 +.35 +.4 +.28 +.25 +.09 +.2 +.16 -.45 +.02 -.01 +.1 -.36 -.07 -.04 +.01 -.24 -.46 -.14 +.09 +.09 +.06 +.05 

0007 -.26 +.2 +.18 +.48 +.46 +.4 1 +.34 +.18 +.09 +.23 +.2 +.13 +.08 +.22 +.15 +.08 +.24 +.25 +.25 +.24 +.21 +.22 +.02 +.14 +.05 +.12 +.09 -.29 +.01 -.02 +.06 -.23 -.04 -.02 +.01 +.01 -.15 -.3 -.09 +.05 +.05 +.03 +.03 

0008 -.36 +.34 +.25 +.61 +.62 +.54 +.34 1 +.37 +.28 +.32 +.28 +.19 +.1 +.3 +.22 +.12 +.34 +.34 +.34 +.33 +.31 +.31 +.22 +.18 +.06 +.16 +.12 -.61 -.03 +.11 -.29 -.06 -.03 +.01 -.09 -.17 -.28 +.07 +.07 +.04 +.04 

0009 -.21 +.18 +.14 +.33 +.31 +.29 +.18 +.37 1 +.06 +.19 +.17 +.11 +.06 +.2 +.06 +.11 +.2 +.2 +.19 +.19 +.18 +.18 +.13 +.08 +.03 +.07 +.06 -.31 +.05 +.06 -.15 -.03 -.01 +.01 -.04 -.08 -.29 +.04 +.04 +.03 +.02 

0010 -.14 +.12 +.1 +.23 +.21 +.13 +.09 +.28 +.06 1 +.13 +.11 +.08 +.06 +.13 +.11 -.03 +.13 +.14 +.13 +.13 +.12 +.12 +.09 +.05 -.01 +.05 +.04 -.19 +.06 +.07 -.1 -.03 -.01 +.01 -.04 -.07 -.26 +.03 +.04 +.01 +.01 

0011 -.87 +.28 +.61 +.49 +.46 +.4 +.23 +.32 +.19 +.13 1 +.85 +.58 +.32 +.89 +.61 +.33 +.83 +.84 +.82 +.82 +.76 +.77 +.54 +.25 +.09 +.16 +.15 +.07 +.07 +.12 +.1 +.04 +.02 +.01 -.01 -.02 +.01 -.13 -.04 -.11 -.09 

0012 -.77 +.25 +.53 +.43 +.41 +.35 +.2 +.28 +.17 +.11 +.85 1 +.33 +.35 +.79 +.49 +.33 +.72 +.74 +.74 +.72 +.68 +.67 +.47 +.21 +.08 +.15 +.13 +.06 +.06 +.01 +.11 +.1 -.02 -.09 -.01 +.01 -.01 -.01 +.01 -.14 +.01 -.06 -.05 

0013 -.51 +.17 +.35 +.29 +.27 +.24 +.13 +.19 +.11 +.08 +.58 +.33 1 -.21 +.52 +.35 +.15 +.5 +.49 +.46 +.48 +.46 +.45 +.36 +.15 +.06 +.1 +.09 +.04 +.04 -.01 +.07 +.05 +.07 -.07 -.01 -.01 +.01 -.1 -.07 -.06 -.05 

0014 -.28 +.09 +.2 +.16 +.15 +.13 +.08 +.1 +.06 +.06 +.32 +.35 -.21 1 +.29 +.26 -.04 +.28 +.27 +.28 +.26 +.21 +.26 +.09 +.08 +.03 +.05 +.05 +.02 +.02 +.03 +.01 -.07 -.01 -.01 +.01 -.05 -.01 -.03 -.02 

0015 -.82 +.27 +.57 +.46 +.44 +.38 +.22 +.3 +.2 +.13 +.89 +.79 +.52 +.29 1 +.58 +.35 +.77 +.78 +.77 +.76 +.71 +.71 +.5 +.09 +.04 +.06 +.05 +.05 +.02 +.03 -.02 -.03 -.06 -.01 -.02 +.03 +.04 +.03 +.04 +.03 

0016 -.55 +.18 +.38 +.32 +.3 +.26 +.15 +.22 +.06 +.11 +.61 +.49 +.35 +.26 +.58 1 -.09 +.52 +.53 +.52 +.52 +.49 +.48 +.35 +.1 +.03 +.08 +.07 +.03 +.02 +.04 -.03 -.01 -.1 +.01 -.01 -.02 +.02 +.01 -.02 -.01 

0017 -.31 +.1 +.21 +.18 +.17 +.14 +.08 +.12 +.11 -.03 +.33 +.33 +.15 -.04 +.35 -.09 1 +.3 +.3 +.29 +.29 +.25 +.28 +.16 +.03 -.02 +.04 +.04 +.02 +.04 +.04 -.03 -.06 -.06 +.01 -.01 -.01 -.02 +.03 +.05 +.01 +.01 

0018 -.95 +.3 +.66 +.5 +.5 +.43 +.24 +.34 +.2 +.13 +.83 +.72 +.5 +.28 +.77 +.52 +.3 1 +.91 +.92 +.89 +.85 +.83 +.6 +.51 +.16 +.43 +.35 +.09 +.1 -.07 +.23 -.16 -.23 -.16 +.02 -.01 -.03 +.01 +.21 +.18 +.12 +.1 

0019 -.96 +.31 +.66 +.53 +.51 +.44 +.25 +.34 +.2 +.14 +.84 +.74 +.49 +.27 +.78 +.53 +.3 +.91 1 +.91 +.91 +.84 +.85 +.59 +.52 +.16 +.44 +.36 +.09 +.1 +.03 +.23 -.07 -.17 -.08 -.06 +.04 -.01 -.02 +.02 +.21 +.2 +.12 +.11 

0020 -.94 +.31 +.66 +.53 +.48 +.42 +.25 +.34 +.19 +.13 +.82 +.74 +.46 +.28 +.77 +.52 +.29 +.92 +.91 1 +.87 +.84 +.83 +.6 +.51 +.16 +.43 +.35 +.08 +.1 -.05 +.23 -.13 -.24 -.13 +.01 +.01 -.02 -.03 +.01 +.21 +.2 +.12 +.11 

0021 -.94 +.3 +.65 +.52 +.5 +.42 +.24 +.33 +.19 +.13 +.82 +.72 +.48 +.26 +.76 +.52 +.29 +.89 +.91 +.87 1 +.82 +.83 +.57 +.51 +.16 +.43 +.35 +.08 +.1 +.04 +.23 -.07 -.16 -.07 -.08 +.04 -.01 -.02 +.02 +.2 +.19 +.12 +.1 

0022 -.88 +.28 +.61 +.49 +.47 +.35 +.21 +.31 +.18 +.12 +.76 +.68 +.46 +.21 +.71 +.49 +.25 +.85 +.84 +.84 +.82 1 +.73 +.55 +.47 +.15 +.4 +.33 +.08 +.09 -.04 +.21 -.1 -.21 -.1 +.01 +.02 -.01 -.03 +.02 +.19 +.18 +.11 +.1 

0023 -.88 +.28 +.61 +.49 +.46 +.4 +.22 +.31 +.18 +.12 +.77 +.67 +.45 +.26 +.71 +.48 +.28 +.83 +.85 +.83 +.83 +.73 1 +.53 +.48 +.15 +.4 +.33 +.08 +.09 +.01 +.21 -.06 -.15 -.07 -.04 +.03 -.01 -.02 +.02 +.19 +.18 +.11 +.1 

0024 -.62 +.2 +.43 +.35 +.33 +.28 +.02 +.22 +.13 +.09 +.54 +.47 +.36 +.09 +.5 +.35 +.16 +.6 +.59 +.6 +.57 +.55 +.53 1 +.33 +.11 +.28 +.23 +.05 +.06 -.01 +.15 -.09 -.16 -.07 +.01 +.02 -.01 -.02 +.01 +.13 +.13 +.08 +.07 

0025 -.54 +.18 +.38 +.3 +.29 +.25 +.14 +.18 +.08 +.05 +.25 +.21 +.15 +.08 +.09 +.1 +.03 +.51 +.52 +.51 +.51 +.47 +.48 +.33 1 +.23 +.6 +.49 +.05 +.04 -.01 +.25 -.03 -.23 -.05 +.01 +.04 -.01 -.02 +.09 +.12 +.28 +.19 +.17 

0026 -.17 +.06 +.12 +.1 +.1 +.09 +.05 +.06 +.03 -.01 +.09 +.08 +.06 +.03 +.04 +.03 -.02 +.16 +.16 +.16 +.16 +.15 +.15 +.11 +.23 1 +.18 +.15 +.01 +.01 -.66 -.03 +.37 -.01 -.02 -.01 +.19 +.07 -.44 +.05 +.04 

0027 -.46 +.15 +.32 +.25 +.24 +.2 +.12 +.16 +.07 +.05 +.16 +.15 +.1 +.05 +.06 +.08 +.04 +.43 +.44 +.43 +.43 +.4 +.4 +.28 +.6 +.18 1 +.29 +.05 +.1 +.27 -.15 -.25 +.05 -.01 -.01 +.35 +.3 -.33 +.33 

0028 -.37 +.12 +.26 +.2 +.19 +.16 +.09 +.12 +.06 +.04 +.15 +.13 +.09 +.05 +.05 +.07 +.04 +.35 +.36 +.35 +.35 +.33 +.33 +.23 +.49 +.15 +.29 1 +.05 +.08 +.23 -.1 -.19 +.03 +.04 -.01 -.01 +.26 +.23 +.32 -.54 

0029 -.09 -.31 +.06 -.46 -.49 -.45 -.29 -.61 -.31 -.19 +.07 +.06 +.04 +.02 +.05 +.03 +.02 +.09 +.09 +.08 +.08 +.08 +.08 +.05 +.05 +.01 +.05 +.05 1 +.06 +.03 +.31 -.02 +.01 +.01 +.01 +.01 +.21 +.03 +.03 +.01 +.01 

0030 -.1 +.02 +.07 +.03 +.03 +.02 +.01 -.03 +.05 +.06 +.07 +.06 +.04 +.02 +.02 +.02 +.04 +.1 +.1 +.1 +.1 +.09 +.09 +.06 +.04 +.01 +.1 +.08 +.06 1 -.13 -.02 -.03 +.03 -.76 +.08 +.05 +.01 +.01 

0031 -.01 -.02 -.01 -.02 +.01 -.01 -.07 +.03 -.05 +.04 -.04 +.01 -.01 -.01 10 +.07 +.04 +.07 -.83 +.03 +.01 +.03 +.01 -.01 -.01 -.01 

0032 -.24 +.08 +.17 +.13 +.12 +.1 +.06 +.11 +.06 +.07 +.12 +.11 +.07 +.03 +.03 +.04 +.04 +.23 +.23 +.23 +.23 +.21 +.21 +.15 +.25 -.66 +.27 +.23 +.03 -.13 1 -.06 -.48 -.02 +.05 -.08 +.17 +.57 +.06 +.05 

0033 +.08 -.14 -.05 -.44 -.43 -.36 -.23 -.29 -.15 -.1 +.1 +.1 +.05 +.01 -.02 -.03 -.03 -.16 -.07 -.13 -.07 -.1 -.06 -.09 -.03 -.03 -.15 -.1 +.31 -.02 +.07 -.06 1 +.23 +.17 -.02 -.01 +.2 +.39 +.14 -.55 -.18 -.21 -.18 

0034 +.17 -.06 -.12 -.09 -.09 -.07 -.04 -.06 -.03 -.03 +.04 -.02 +.07 -.03 -.01 -.06 -.23 -.17 -.24 -.16 -.21 -.15 -.16 -.23 +.37 -.25 -.19 -.02 -.03 +.04 -.48 +.23 1 +.16 -.01 -.04 +.01 +.02 +.09 -.31 -.89 -.22 -.19 

0035 +.08 -.03 -.06 -.03 -.04 -.04 -.02 -.03 -.01 -.01 +.02 -.09 -.07 -.07 -.06 -.1 -.06 -.16 -.08 -.13 -.07 -.1 -.07 -.07 -.05 -.01 +.03 +.07 -.02 +.17 +.16 1 -.02 +.02 +.01 +.01 -.12 -.1 -.26 -.23 

0036 -.01 +.01 -.01 +.01 -.01 +.02 -.06 +.01 -.08 +.01 -.04 +.01 +.01 +.01 -.83 -.02 -.01 -.02 10 +.01 +.01 +.01 +.02 +.01 

0037 -.04 +.01 +.03 +.02 +.02 +.01 +.01 +.01 +.01 +.01 +.01 +.01 +.01 +.01 +.04 +.01 +.04 +.02 +.03 +.02 +.04 -.02 +.05 +.04 +.01 +.03 +.03 +.05 -.01 -.04 +.02 1 +.01 +.01 -.03 +.06 +.07 +.03 +.02 

0038 +.01 -.03 -.01 -.32 -.29 -.24 -.15 -.09 -.04 -.04 -.01 -.01 -.01 -.01 -.01 -.01 -.01 -.01 -.01 -.02 -.01 -.01 -.01 -.01 -.01 -.01 -.01 +.01 +.01 +.2 +.01 +.01 +.01 1 +.51 +.06 

0039 +.02 -.07 -.01 -.62 -.57 -.46 -.3 -.17 -.08 -.07 -.02 -.01 -.01 -.01 -.02 -.02 -.01 -.03 -.02 -.03 -.02 -.03 -.02 -.02 -.02 -.01 -.01 -.01 +.01 +.03 +.39 +.02 +.01 +.01 +.01 +.51 1 +.12 -.01 

0040 -.02 -.07 +.01 -.17 -.16 -.14 -.09 -.28 -.29 -.26 +.01 +.01 +.01 +.01 +.03 +.02 -.02 +.01 +.02 +.01 +.02 +.02 +.02 +.01 +.09 +.19 +.21 -.76 +.01 -.08 +.14 +.09 -.03 +.06 +.12 1 -.04 -.11 +.01 +.01 

0041 -.22 +.07 +.15 +.11 +.11 +.09 +.05 +.07 +.04 +.03 -.13 -.14 -.1 -.05 +.04 +.03 +.21 +.21 +.21 +.2 +.19 +.19 +.13 +.12 +.07 +.35 +.26 +.03 +.08 -.01 +.17 -.55 -.31 -.12 +.01 +.06 -.04 1 +.37 +.39 +.33 

0042 -.21 +.07 +.14 +.11 +.1 +.09 +.05 +.07 +.04 +.04 -.04 +.01 -.07 -.01 +.03 +.01 +.05 +.18 +.2 +.2 +.19 +.18 +.18 +.13 +.28 -.44 +.3 +.23 +.03 +.05 +.57 -.18 -.89 -.1 +.01 +.07 -.01 -.11 +.37 1 +.25 +.22 

0043 -.13 +.04 +.09 +.07 +.07 +.06 +.03 +.04 +.03 +.01 -.11 -.06 -.06 -.03 +.04 -.02 +.01 +.12 +.12 +.12 +.12 +.11 +.11 +.08 +.19 +.05 -.33 +.32 +.01 +.01 -.01 +.06 -.21 -.22 -.26 +.02 +.03 +.01 +.39 +.25 1 +.07 

0044 -.11 +.04 +.08 +.06 +.06 +.05 +.03 +.04 +.02 +.01 -.09 -.05 -.05 -.02 +.03 -.01 +.01 +.1 +.11 +.11 +.1 +.1 +.1 +.07 +.17 +.04 +.33 -.54 +.01 +.01 -.01 +.05 -.18 -.19 -.23 +.01 +.02 +.01 +.33 +.22 +.07 1 



TABLE V: Correction factor correlation matrix. The top row and left column show correction factor codes. Each element of the matrix shows the correlation between 
the correction factors corresponding to the column and row. Each matrix element is dimensionless; the elements along the diagonal are unity; the matrix is symmetric; 
positive elements indicate positive correlation, and negative elements anti-correlation. 
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FIG. 16: Variation of the fc-factors for inclusive W and Z 
production under different choices of parton distribution func- 
tions, from the Alekhin parton distribution error set [47t ] . The 
correlation of the uncertainty on these two fc-factors due to 
uncertainty in the parton distribution functions is 0.955. 
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explicitly acknowledges this high degree of correlation. 

Theoretical constraints on all other fc-factors are as- 
sumed to be uncorrelated with each other, not because 
the uncertainties of these calculations are indeed uncor- 
related, but rather because the correlations among these 
computations are poorly known. 

0002, 0003. The cosmic 7 and cosmic j back- 
grounds are estimated using events recorded in the CDF 
data with one or more reconstructed photons and with 
two or fewer reconstructed tracks. The use of events with 
two or fewer reconstructed tracks is a new technique for 
estimating these backgrounds. These correction factors 
are primarily constrained by the number of events in the 
Vista 7^ and jj> final states. The values are related 
to (and consistent with) the fraction of bunch crossings 
with one or more inelastic pp interactions, complicated 
slightly by the requirement that any jet falling in the fi- 
nal state j j) has at least 5 GeV of track px within a cone 
of 0.4 relative to the jet axis. 

0004, 0005, 0006, 0007. The NLOJET++ calcula- 
tion of the 7J inclusive fc-factor constrains the cross sec- 
tion weighted sum of the 77, 72.7, 73j, and 74 j correction 
factors to 1.25 ±0.15 @ 

0008, 0009, 0010. The DIPHOX calculation of the 
inclusive 77 cross section at NLO constrains the weighted 
sum of these correction factors to 2.0±0.15 [45] . From Ta- 
ble!]] the 77J fc-factor (0009) appears anomalously large. 
Figure [T7] shows a calculation of this 77 j fc-factor using 
NLOJET++ [43j, |44| as a function of summed transverse 
momentum. The NLO correction to the LO prediction is 
found to be large, and not manifestly inconsistent with 
the value for this fc-factor determined from the Vista 
fit. The cross section for 772 j production has not been 
calculated at NLO. 



FIG. 17: Calculation of the 777' fc-factor, as a function of jet 
transverse momentum. The effect of changing the factoriza- 
tion scale by a factor of two in either direction is also shown 
(small black points with error bars). 



0011, 0012, 0013, 0014. These correction factors 
correspond to fc-factors for W production in association 
with zero, one, two, and three or more jets, respectively. 
A linear combination of these correction factors is con- 
strained by the requirement that the inclusive W pro- 
duction cross section is consistent with the NNLO calcu- 
lation of Ref. [47| • The values of these correction factors, 
and their trend of decreasing as the number of jets in- 
creases, depends heavily on the choice of renormalization 
and factorization scales. The individual correction fac- 
tors are not explicitly constrained by a NLO calculation. 

0015, 0016, 0017. These correction factors corre- 
spond to fc-factors for Z production in association with 
zero, one, and two or more jets, respectively. A linear 
combination of these correction factors is constrained by 
the requirement that the inclusive Z production cross sec- 
tion is consistent with the NNLO calculation of Ref. [13] • 

0018, 0019. The two fc-factors for dijet production 
correspond to two bins in px, the pt of the hard two 
to two scattering in the parton center of mass frame. 
These correction factors are constrained by a NLO cal- 
culation [I!], and show expected behavior as a function 
of Pt- 

0020, 0021. The two fc-factors for 3-jet production, 
corresponding to two bins in Pt, are unconstrained by 
any NLO calculation, but show reasonable behavior as a 
function of px- 

0022, 0023. The fc-factors for 4-jet production, cor- 
responding to two bins in px, are unconstrained by any 



3G 



NLO calculation, but show reasonable behavior as a func- 
tion of pt- 

0024 . The fc-factor for the production of five or more 
jets, constrained primarily by the Vista low-p^ 5j final 
state, is found to be close to unity. 



b. Identification efficiencies 

The correction factors in this section, although billed 
as "identification efficiencies," are truly ratios of the iden- 
tification efficiency in the data relative to the identifica- 
tion efficiency in CdfSim. A correction factor value of 
unity indicates a proper modeling of the overall identi- 
fication efficiency by CdfSim; a correction factor value 
of 0.5 indicates that CdfSim overestimates the overall 
identification efficiency by a factor of two. 

0025. The central electron identification efficiency 
scale factor is close to unity, indicating the central elec- 
tron efficiency measured in data is similar (to within 1%) 
to the central electron efficiency in the CDF detector sim- 
ulation. This reflects an emphasis within CDF on tuning 
the detector simulation for central electrons. The deter- 
mination of this correction factor is dominated by the 
Vista final states ej> and e + e~, where one of the elec- 
trons has |?7| < 1. 

0026. The plug electron identification efficiency scale 
factor is several percent less than unity, indicating that 
the CDF detector simulation slightly overestimates the 
electron identification efficiency in the plug region of the 
CDF detector. The determination of this correction fac- 
tor is dominated by the Vista final states ej> and e + e~ , 
where one of the electrons has \r]\ > f . 

0027. 0028. To reduce backgrounds hypothesized to 
arise from pion and kaon decays in flight with a substan- 
tially mismeasured track, a very good track fit in the 
CDF tracker is required. Partially due to this tight track 
fit requirement, CDF muon identification efficiencies in 
the regions < 0.6 and 0.6 < \r]\ < 1.5 are overesti- 
mated in the CDF detector simulation by over 10%. The 
determination of the identification efficiencies p(/j,^fi) is 
dominated by the Vista final states and 

0029 . The central photon identification efficiency 
scale factor is determined primarily by the number of 
events in the Vista final states jj and 77. The uncer- 
tainty on this correction factor is highly correlated with 
the uncertainties on the jj A:-factor, the p(j— >j) fake 
rate, and the 77 fc-factor. 

0030 . The plug photon identification efficiency scale 
factor is determined primarily by the number of events 
in the Vista final state 77. The uncertainty on this 
correction factor is highly correlated with the uncertainty 
on the plug p(J — >7) fake rate. 

0031 . The 6-jet identification efficiency is determined 
to be consistent with the prediction from CdfSim. 



c. Fake rates 

0032 . The fake rate p(e — >7) for electrons to be mis- 
reconstructed as photons in the plug region of the detec- 
tor is added on top of the significant number of electrons 
misreconstructed as photons by CdfSim. 

0033 . In Vista, the contribution of jets faking elec- 
trons is modeled by applying a fake rate p(j—>e) to 
Monte Carlo jets. Vista represents the first large 
scale Tevatron analysis in which a completely Monte 
Carlo based modeling of jets faking electrons is em- 
ployed. Significant understanding of the physical mecha- 
nisms contributing to this fake rate has been achieved, 
as summarized in Appendix IA 11 Consistency with 
this understanding is required; for example, p{j^e) » 
p(j ~~ * l)p(l ~~ * e ) • The value of this correction factor is de- 
termined primarily by the number of events in the Vista 
final state ej, where the electron is identified in the cen- 
tral region of the CDF detector. It is notable that this 
fake rate is independent of global event properties, and 
that a consistent simultaneous understanding of the ej, 
elj, e3j, and e4j final states is obtained. 

0034 . The value of the fake rate p(j — > e) in the plug 
region of the CDF detector is roughly one order of mag- 
nitude larger than the corresponding fake rate p(j—*e) 
in the central region of the detector, consistent with an 
understanding of the relative performance of the detec- 
tor in the central and plug regions for the identification 
of electrons. This correction factor is determined primar- 
ily by the number of events in the Vista final state ej, 
where the electron is identified in the plug region of the 
CDF detector. 

0035 . In Vista, the contribution of jets faking 
muons is modeled by applying a fake rate p(J — > /i) 
to Monte Carlo jets. Vista represents the first large 
scale Tevatron analysis in which a completely Monte 
Carlo based modeling of jets faking muons is employed. 
The value obtained from the Vista fit is seen to be 
roughly one order of magnitude smaller than the fake 
rate p{j —>e) in the central region of the detector, consis- 
tent with our understanding of the physical mechanisms 
underlying these fake rates, as described in Appendix IA II 
The value of this correction factor is determined primar- 
ily by the number of events in the Vista final state j\i. 

0036. The fake rate p(j—>b) has px dependence ex- 
plicitly imposed. The number of tracks inside a typical 
jet, and hence the probability that a secondary vertex 
is (mis)reconstructed, increases with jet px- The values 
of these correction factors are consistent with the mistag 
rate determined using secondary vertices reconstructed 
on the other side of the beam axis with respect to the di- 
rection of the tagged jet 49]. The value of this correction 
factor is determined primarily by the number of events 
in the Vista final states bj and bb. 

0037. 0038. The fake rate p(j — ► r) decreases with jet 
Pt, since the number of tracks inside a typical jet in- 
creases with jet px- The values of these correction factors 
are determined primarily by the number of events in the 
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Vista final state jr. 

0039, 0040. The fake rate p(j — > 7) is determined sep- 
arately in the central and plug regions of the CDF de- 
tector. The values of these correction factors are deter- 
mined primarily by the number of events in the Vista 
final states jj and 77. The value obtained for 0039 is 
consistent with the value obtained from a study using de- 
tailed information from the central preshower detector. 
The fake rate determined in the plug region is notice- 
ably higher than the fake rate determined in the central 
region, as expected. 



APPENDIX B: SLEUTH DETAILS 

This appendix elaborates on the Sleuth partitioning 
rule, and on the minimum number of events required for 
a final state to be considered by Sleuth. 



1. Partitioning 

Table I VII lists the Vista final states associated with 
each Sleuth final state. 



d. Trigger efficiencies 

0041. The central electron trigger inefficiency is dom- 
inated by not correctly reconstructing the electron's track 
at the first online trigger level. 

0042. The plug electron trigger inefficiency is due to 
inefficiencies in clustering at the second online trigger 
level. 

0043. 0044. The muon trigger inefficiencies in the 
regions \r]\ < 0.6 and 0.6 < \rj\ < 1.0 derive partly from 
tracking inefficiency, and partly from an inefficiency in 
reconstructing muon stubs in the CDF muon chambers. 

The value of these corrections factors are consis- 
tent with other trigger efficiency measurements made 
using additional information (Hoj . 



e. Energy scales 

The Vista infrastructure also allows the jet energy 
scale to be treated as a correction factor. At present this 
correction factor is not used, since there is no discrepancy 
requiring it. 

To understand the effect of introducing such a correc- 
tion factor, a jet energy scale correction factor is added 
and constrained to 1±0.03, reflecting the jet energy scale 
determination at CDF [l3j]. The fit returns a value with 
a very small error, since this correction factor is highly 
constrained by the low-pr 2j, 3j, e j, and e 2j final states. 
Assuming perfectly correct modeling of jets faking elec- 
trons, as described in Appendix IA 11 this is a correct 
energy scale error. The inclusion of additional correction 
factor degrees of freedom to reflect possible imperfections 
in this modeling of jets faking electrons increases the en- 
ergy scale error. The interesting conclusion is that the 
jet energy scale (considered as a lone free parameter) is 
very well constrained by the large number of dijet events; 
adjustment to the jet energy scale must be accompanied 
by simultaneous adjustment of other correction factors 
(such as the dijet fc-factor) in order to retain agreement 
with data. 



2. Minimum number of events 

This section expands on a subtle point in the definition 
of the Sleuth algorithm: for purely practical considera- 
tions, only final states in which three or more events are 
observed in the data are considered. 

Suppose V e + e - b i = 10 ; then in computing V all final 
states with b > 10~ 6 must be considered and accounted 
for. (A final state with b — 10~ 7 , on the other hand, 
counts as only as 0.1 final states, since the fraction of 
hypothetical similar experiments in which V < 10~ 6 in 
this final state is equal to the fraction of hypothetical 
similar experiments in which one or more events is seen 
in this final state, which is 10~ 7 .) This is a large practical 
problem, since it requires that all final states with b > 
10~ 6 be enumerated and estimated, and it is difficult to 
do this belie vably. 

To solve this problem, let Sleuth consider only final 
states with at least <i m i n events observed in the data. The 
goal is to be able to find V < 10 -3 . There will be some 
number iVf s (b mm ) of final states with expected number 
of events b > 6 m i n , writing Nf s explicitly as a function 
of femin! thus 6 m i n must be chosen to be sufficiently large 
that all of these Nf s (b m - m ) final states can be enumerated 
and estimated. The time cost of simulating events is such 
that the integrated luminosity of Monte Carlo events is at 
most 100 times the integrated luminosity of the data; this 
practical constraint restricts b m ; n > 0.01. The number of 
Sleuth Tevatron Run II final states with b > 0.01 is 
N {s (b min = 0.01) w 10 3 . 

For small V m i n , keeping the first term in a binomial 
expansion yields V = V m i n N{ s (b m i n ), where V mnl is the 
smallest V found in any final state. From the discussion 
above, the computation of V from V m in can only be jus- 
tified if V m in > (&min min ); if otherwise, final states with 
b < b m i n will need to be accounted for. Thus V can be 
confidently computed only if V > (6 m in dmin )^fs(^min)- 

Solving this inequality for d lmn and inserting values 
from above, 

, . logio g ~ iogio jVfatftmin) _ -3-3 

dmin > : 7 ~ 7T- = A - ( B1 i 

log 10 bmin -2 

A believable trials factor can be computed if c? m i n > 3. 
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TABLE VI: Correspondence between Sleuth and Vista final states. The first column shows the Sleuth final state formed 
by merging the populated VlSTA final states in the second column. Charge conjugates of each VlSTA final state are implied. 



At the other end of the scale, computational strength 
limits the maximum number of events Sleuth is able 
to consider to < 10 4 . Excesses in which the number of 
events exceed 10 4 are expected to be identified by Vista's 
normalization statistic. 

For each final state, pseudo experiments arc run until 
V is determined to within a fractional precision of 5% or 



a time limit is exceeded. If the time limit is exceeded 
before V is determined to within the desired fractional 
precision of 5%, Sleuth returns an upper bound on V, 
and indicates explicitly that only an upper bound has 
been determined. For the data described in this article, 
the desired precision is obtained. 
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the missing transverse momentum to less than 10 GeV. 
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hadronic energy clusters in the event, and the hadronic 
energy resolution of the CDF detector has been approxi- 
mated as 100%y / pr~, expressed in GeV. An event is said 
to contain missing transverse momentum if ]/> T > 17 GeV 
and i> T ' > 10 GeV. 
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Wj , so that the total standard model prediction from 



these Monte Carlo events is SM 
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"effective weight" w B g of these events can be taken to 

be the weighted average of the weights: w e g = A WiW;l . 

The "effective number of Monte Carlo events" is N e s = 
SM/tUcH , and the error on the standard model prediction 
is <5SM = SM/^N^. 

Final states for which p > 0.5 after accounting for the 
trials factor are not even mildly interesting, and the 
corresponding a after accounting for the trials factor is 
not quoted. For the mildly interesting final states with 
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p < 0.5 after accounting for the trials factor, a is quoted 
as positive if the number of observed data events ex- 
ceeds the standard model prediction, and negative if the 
number of observed data events is less than the standard 
model prediction. 
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